可扩展深度学习推理:算法方法

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI:10.1109/IPDPSW50202.2020.00166

Minsik Cho

{"title":"可扩展深度学习推理:算法方法","authors":"Minsik Cho","doi":"10.1109/IPDPSW50202.2020.00166","DOIUrl":null,"url":null,"abstract":"Large-scale deep learning training has made significant progress in the last few years: more powerful systems/accelerators are delivered (i.e., Summit cluster), innovative training mechanisms are designed (i.e., sophisticated hyper-parm tuning), and advantage communication techniques are exercised (i.e., async-SGD). However, deep learning inference has rather limited options when it comes to scaling up the model density per device. Quantization to lower precision can be helpful along with sparsification such as pruning and compression yet suffers from the underlying hardware architecture and efficacy.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable Deep Learning Inference: Algorithmic Approach\",\"authors\":\"Minsik Cho\",\"doi\":\"10.1109/IPDPSW50202.2020.00166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale deep learning training has made significant progress in the last few years: more powerful systems/accelerators are delivered (i.e., Summit cluster), innovative training mechanisms are designed (i.e., sophisticated hyper-parm tuning), and advantage communication techniques are exercised (i.e., async-SGD). However, deep learning inference has rather limited options when it comes to scaling up the model density per device. Quantization to lower precision can be helpful along with sparsification such as pruning and compression yet suffers from the underlying hardware architecture and efficacy.\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00166\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大规模深度学习训练在过去几年中取得了重大进展:交付了更强大的系统/加速器(即Summit集群)，设计了创新的训练机制(即复杂的超参数调优)，并运用了优势的通信技术(即async-SGD)。然而，当涉及到每个设备的模型密度时，深度学习推理的选择相当有限。量化到较低的精度可能有助于稀疏化，如修剪和压缩，但会受到底层硬件架构和效率的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Deep Learning Inference: Algorithmic Approach

Large-scale deep learning training has made significant progress in the last few years: more powerful systems/accelerators are delivered (i.e., Summit cluster), innovative training mechanisms are designed (i.e., sophisticated hyper-parm tuning), and advantage communication techniques are exercised (i.e., async-SGD). However, deep learning inference has rather limited options when it comes to scaling up the model density per device. Quantization to lower precision can be helpful along with sparsification such as pruning and compression yet suffers from the underlying hardware architecture and efficacy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量