天狼星:一个开放的端到端语音和视觉个人助理及其对未来仓库规模计算机的影响

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2015-03-14 DOI:10.1145/2694344.2694347

Johann Hauswald, M. Laurenzano, Yunqi Zhang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars

{"title":"天狼星:一个开放的端到端语音和视觉个人助理及其对未来仓库规模计算机的影响","authors":"Johann Hauswald, M. Laurenzano, Yunqi Zhang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars","doi":"10.1145/2694344.2694347","DOIUrl":null,"url":null,"abstract":"As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10x and 16x. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6x and 1.4x, respectively.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"228","resultStr":"{\"title\":\"Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers\",\"authors\":\"Johann Hauswald, M. Laurenzano, Yunqi Zhang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars\",\"doi\":\"10.1145/2694344.2694347\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10x and 16x. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6x and 1.4x, respectively.\",\"PeriodicalId\":403247,\"journal\":{\"name\":\"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"228\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2694344.2694347\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2694344.2694347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 228

摘要

随着用户对苹果的Siri、谷歌的Google Now和微软的Cortana等智能个人助理(IPAs)的需求不断扩大，我们正在接近当前数据中心架构的计算极限。未来的服务器架构应该如何发展以支持这类新兴的应用程序，这是一个悬而未决的问题，而缺乏开源的IPA工作负载是解决这个问题的一个障碍。在本文中，我们介绍了Sirius的设计，这是一个开放的端到端IPA web服务应用程序，它接受语音和图像形式的查询，并以自然语言进行响应。然后，我们使用此工作负载来研究未来基于加速器的服务器架构(跨越传统cpu、gpu、多核吞吐量协处理器和fpga)在设计空间中的四个要点的含义。为了研究Sirius未来的服务器设计，我们将Sirius分解为包含天狼星计算密集型瓶颈的7个基准测试套件(Sirius suite)。我们将Sirius Suite移植到一系列加速器平台，并使用这些平台之间的性能和功耗权衡来执行各种服务器设计点的总拥有成本(TCO)分析。在我们的研究中，我们发现加速器对IPA服务的未来可扩展性至关重要。我们的结果表明，GPU和fpga加速服务器的查询延迟平均提高了10倍和16倍。对于给定的吞吐量，GPU和fpga加速的服务器可以分别将数据中心的TCO降低2.6倍和1.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10x and 16x. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6x and 1.4x, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

自引率

0.00%

发文量