STAC-A2 on Intel Architecture: From Scalar Code to Heterogeneous Application

Evgeny Fiksman, S. Salahuddin
{"title":"STAC-A2 on Intel Architecture: From Scalar Code to Heterogeneous Application","authors":"Evgeny Fiksman, S. Salahuddin","doi":"10.1109/WHPCF.2014.6","DOIUrl":null,"url":null,"abstract":"STAC-A2™ is compute and memory intensive industry benchmark in the field of market risk analysis. The benchmark specifications were created by the Securities Technology Analysis Center (aka STAC®) and are based on inputs collected from the leading trading companies, universities, and high performance computing vendors. The specifications describe the models which represent realistic market risk analysis workloads. In this paper we discuss the development steps that lead to competitive performance of the STAC-A2 benchmark executed on systems consisting of Intel® Xeon® processor(s) and an Intel® Xeon Phi™ coprocessor. We show the importance of utilization of all parallel resources available on Intel architectures to achieve maximum performance. We demonstrate that the offload extension supported by Intel® Composer XE minimizes the efforts required to create accelerated applications by using only C/C++ language. With Intel's latest implementation of the STAC-A2 benchmark we were able to achieve a significant (800%) performance gain by using a heterogeneous approach running on two Intel Xeon E5-2699 v3 processors and a single Intel® Xeon Phi™ 7120A card, compared to earlier version running on only two Intel Xeon E5-2697 v2 processors. This implementation outperforms Nvidia's implementation based on an Intel Xeon processor based server with two NVIDIA* K20Xm cards.","PeriodicalId":368134,"journal":{"name":"2014 Seventh Workshop on High Performance Computational Finance","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Seventh Workshop on High Performance Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WHPCF.2014.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

STAC-A2™ is compute and memory intensive industry benchmark in the field of market risk analysis. The benchmark specifications were created by the Securities Technology Analysis Center (aka STAC®) and are based on inputs collected from the leading trading companies, universities, and high performance computing vendors. The specifications describe the models which represent realistic market risk analysis workloads. In this paper we discuss the development steps that lead to competitive performance of the STAC-A2 benchmark executed on systems consisting of Intel® Xeon® processor(s) and an Intel® Xeon Phi™ coprocessor. We show the importance of utilization of all parallel resources available on Intel architectures to achieve maximum performance. We demonstrate that the offload extension supported by Intel® Composer XE minimizes the efforts required to create accelerated applications by using only C/C++ language. With Intel's latest implementation of the STAC-A2 benchmark we were able to achieve a significant (800%) performance gain by using a heterogeneous approach running on two Intel Xeon E5-2699 v3 processors and a single Intel® Xeon Phi™ 7120A card, compared to earlier version running on only two Intel Xeon E5-2697 v2 processors. This implementation outperforms Nvidia's implementation based on an Intel Xeon processor based server with two NVIDIA* K20Xm cards.
Intel架构上的STAC-A2:从标量代码到异构应用
STAC-A2™是市场风险分析领域的计算和内存密集型行业基准。基准规范由证券技术分析中心(又名STAC®)创建,并基于从领先的交易公司、大学和高性能计算供应商收集的输入。这些规范描述了代表现实市场风险分析工作负载的模型。在本文中,我们讨论了导致在由Intel®Xeon®处理器和Intel®Xeon Phi™协处理器组成的系统上执行的具有竞争力性能的STAC-A2基准的开发步骤。我们展示了利用英特尔架构上可用的所有并行资源以实现最大性能的重要性。我们演示了由Intel®Composer XE支持的卸载扩展,通过仅使用C/ c++语言来最大限度地减少创建加速应用程序所需的工作量。与仅在两个Intel Xeon E5-2697 v2处理器上运行的早期版本相比,通过使用异构方法在两个Intel Xeon E5-2699 v3处理器和单个Intel®Xeon Phi™7120A卡上运行,我们能够实现显着(800%)的性能提升。该实现优于Nvidia基于英特尔至强处理器的服务器和两个Nvidia * K20Xm卡的实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信