The Difficult Balance Between Modern Hardware and Conventional CPUs

Proceedings of the 19th International Workshop on Data Management on New Hardware Pub Date : 2023-06-18 DOI:10.1145/3592980.3595314

Fabio Maschi, G. Alonso

{"title":"The Difficult Balance Between Modern Hardware and Conventional CPUs","authors":"Fabio Maschi, G. Alonso","doi":"10.1145/3592980.3595314","DOIUrl":null,"url":null,"abstract":"Research has demonstrated the potential of accelerators in a wide range of use cases. However, there is a growing imbalance between modern hardware and the CPUs that submit the workload. Recent studies of GPUs on real systems have shown that many servers are often needed per accelerator to generate a high enough load so the computing power is leveraged. This fact is often ignored in research, although it often determines the actual feasibility and overall efficiency of a deployment. In this paper, we conduct a detailed study of the possible configurations and overall cost efficiency of deploying an FPGA-based accelerator on a commercial search engine. First, we show that there are many possible configurations balancing the upstream system and the way the accelerator is configured. Of these configurations, not all of them are suitable in practice, even if they provide some of the highest throughput. Second, we analyse the cost of a deployment capable of sustaining the required workload of the commercial search engine. We examine deployments both on-premises and in the cloud with and without FPGAs and with different board models. The results show that, while FPGAs have the potential to significantly improve overall performance, the performance imbalance between their host CPUs and the FPGAs can make the deployments economically unattractive. These findings are intended to inform the development and deployment of accelerators by showing what is needed on the CPU side to make them effective and also to provide important insights into their end-to-end integration within existing systems.","PeriodicalId":400127,"journal":{"name":"Proceedings of the 19th International Workshop on Data Management on New Hardware","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3592980.3595314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Research has demonstrated the potential of accelerators in a wide range of use cases. However, there is a growing imbalance between modern hardware and the CPUs that submit the workload. Recent studies of GPUs on real systems have shown that many servers are often needed per accelerator to generate a high enough load so the computing power is leveraged. This fact is often ignored in research, although it often determines the actual feasibility and overall efficiency of a deployment. In this paper, we conduct a detailed study of the possible configurations and overall cost efficiency of deploying an FPGA-based accelerator on a commercial search engine. First, we show that there are many possible configurations balancing the upstream system and the way the accelerator is configured. Of these configurations, not all of them are suitable in practice, even if they provide some of the highest throughput. Second, we analyse the cost of a deployment capable of sustaining the required workload of the commercial search engine. We examine deployments both on-premises and in the cloud with and without FPGAs and with different board models. The results show that, while FPGAs have the potential to significantly improve overall performance, the performance imbalance between their host CPUs and the FPGAs can make the deployments economically unattractive. These findings are intended to inform the development and deployment of accelerators by showing what is needed on the CPU side to make them effective and also to provide important insights into their end-to-end integration within existing systems.

查看原文本刊更多论文

现代硬件和传统cpu之间的艰难平衡

研究已经证明了加速器在广泛用例中的潜力。然而，现代硬件和提交工作负载的cpu之间的不平衡越来越严重。最近对真实系统上的gpu的研究表明，每个加速器通常需要许多服务器来产生足够高的负载，从而充分利用计算能力。这一事实在研究中经常被忽视，尽管它经常决定部署的实际可行性和整体效率。在本文中，我们详细研究了在商业搜索引擎上部署基于fpga的加速器的可能配置和总体成本效率。首先，我们展示了有许多可能的配置平衡上游系统和加速器的配置方式。在这些配置中，并不是所有的配置都适合实践，即使它们提供了一些最高的吞吐量。其次，我们分析了能够维持商业搜索引擎所需工作量的部署成本。我们研究了内部部署和云中使用和不使用fpga以及不同板模型的部署。结果表明，虽然fpga具有显著提高整体性能的潜力，但其主机cpu和fpga之间的性能不平衡可能使部署在经济上没有吸引力。这些发现旨在为加速器的开发和部署提供信息，说明CPU端需要什么才能使它们有效，并为它们在现有系统中的端到端集成提供重要见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th International Workshop on Data Management on New Hardware

自引率

0.00%

发文量