{"title":"在科学计算中利用最先进的x86架构","authors":"A. Heinecke, T. Auckenthaler, C. Trinitis","doi":"10.1109/ISPDC.2012.15","DOIUrl":null,"url":null,"abstract":"In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×86 architectures, namely Intel Sandy Bridge and AMD Bulldozer, for four different parallel workloads with different properties. These properties comprise optimally cache-blocked algorithms as well as adaptive grid structures resulting in memory latency and bandwidth bound executions. The achieved performance on both architectures is very promising, and, if extrapolated towards upcoming server silicon, can be regarded as on par with current high-end GPU based accelerators.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Exploiting State-of-the-Art x86 Architectures in Scientific Computing\",\"authors\":\"A. Heinecke, T. Auckenthaler, C. Trinitis\",\"doi\":\"10.1109/ISPDC.2012.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×86 architectures, namely Intel Sandy Bridge and AMD Bulldozer, for four different parallel workloads with different properties. These properties comprise optimally cache-blocked algorithms as well as adaptive grid structures resulting in memory latency and bandwidth bound executions. The achieved performance on both architectures is very promising, and, if extrapolated towards upcoming server silicon, can be regarded as on par with current high-end GPU based accelerators.\",\"PeriodicalId\":287900,\"journal\":{\"name\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPDC.2012.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Symposium on Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC.2012.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting State-of-the-Art x86 Architectures in Scientific Computing
In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×86 architectures, namely Intel Sandy Bridge and AMD Bulldozer, for four different parallel workloads with different properties. These properties comprise optimally cache-blocked algorithms as well as adaptive grid structures resulting in memory latency and bandwidth bound executions. The achieved performance on both architectures is very promising, and, if extrapolated towards upcoming server silicon, can be regarded as on par with current high-end GPU based accelerators.