在科学计算中利用最先进的x86架构

2012 11th International Symposium on Parallel and Distributed Computing Pub Date : 2012-06-25 DOI:10.1109/ISPDC.2012.15

A. Heinecke, T. Auckenthaler, C. Trinitis

{"title":"在科学计算中利用最先进的x86架构","authors":"A. Heinecke, T. Auckenthaler, C. Trinitis","doi":"10.1109/ISPDC.2012.15","DOIUrl":null,"url":null,"abstract":"In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×86 architectures, namely Intel Sandy Bridge and AMD Bulldozer, for four different parallel workloads with different properties. These properties comprise optimally cache-blocked algorithms as well as adaptive grid structures resulting in memory latency and bandwidth bound executions. The achieved performance on both architectures is very promising, and, if extrapolated towards upcoming server silicon, can be regarded as on par with current high-end GPU based accelerators.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Exploiting State-of-the-Art x86 Architectures in Scientific Computing\",\"authors\":\"A. Heinecke, T. Auckenthaler, C. Trinitis\",\"doi\":\"10.1109/ISPDC.2012.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×86 architectures, namely Intel Sandy Bridge and AMD Bulldozer, for four different parallel workloads with different properties. These properties comprise optimally cache-blocked algorithms as well as adaptive grid structures resulting in memory latency and bandwidth bound executions. The achieved performance on both architectures is very promising, and, if extrapolated towards upcoming server silicon, can be regarded as on par with current high-end GPU based accelerators.\",\"PeriodicalId\":287900,\"journal\":{\"name\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPDC.2012.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Symposium on Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC.2012.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

近年来，通用×86体系结构经历了面向高性能计算能力的重大修改。最近，引入了诸如更宽矢量单位或融合乘加(FMA)指令之类的技术，这些技术主要来自GPU架构。在本文中，我们研究了当前×86架构，即Intel Sandy Bridge和AMD Bulldozer，在四种不同的并行工作负载下具有不同属性的性能。这些属性包括最佳缓存阻塞算法以及导致内存延迟和带宽限制执行的自适应网格结构。在这两种架构上实现的性能都非常有希望，并且，如果推断到即将到来的服务器芯片，可以被视为与当前基于高端GPU的加速器相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting State-of-the-Art x86 Architectures in Scientific Computing

In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×86 architectures, namely Intel Sandy Bridge and AMD Bulldozer, for four different parallel workloads with different properties. These properties comprise optimally cache-blocked algorithms as well as adaptive grid structures resulting in memory latency and bandwidth bound executions. The achieved performance on both architectures is very promising, and, if extrapolated towards upcoming server silicon, can be regarded as on par with current high-end GPU based accelerators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 11th International Symposium on Parallel and Distributed Computing

自引率

0.00%

发文量