High Performance Computational Finance最新文献

筛选
英文 中文
Intel® version of STAC-A2 benchmark: toward better performance with less effort Intel®版本的STAC-A2基准测试:以更少的努力获得更好的性能
High Performance Computational Finance Pub Date : 2013-11-18 DOI: 10.1145/2535557.2535566
Andrey Nikolaev, Ilya Burylov, S. Salahuddin
{"title":"Intel® version of STAC-A2 benchmark: toward better performance with less effort","authors":"Andrey Nikolaev, Ilya Burylov, S. Salahuddin","doi":"10.1145/2535557.2535566","DOIUrl":"https://doi.org/10.1145/2535557.2535566","url":null,"abstract":"Market risk analysis is a computationally intensive problem which requires powerful computing resources. To enable consistent comparisons of vendors' technologies in this area the Securities Technology Analysis Center (STAC*), with inputs from leading trading companies, universities, and high performance computing vendors, has created STAC-A2* specifications which describe realistic market risk analysis workloads.\u0000 In this paper we analyze and compare the performance of STAC-A2 workloads on two systems based on Intel® processors: Intel® Xeon® processor E5 family and Intel® Xeon Phi#8482; coprocessor. We show the importance of algorithmic optimizations and a few mathematical building blocks such as random number generation, mathematical functions and matrix multiplications on overall performance of the benchmark. We demonstrate that changes made in response to this analysis provide an additional ~1.6x performance improvement of the STAC-A2 benchmark on the Intel Xeon processor E5 family and up to ~15x performance improvement on Intel Xeon Phi coprocessor-based systems compared with the previous version of the benchmark. Intel Xeon Phi coprocessor architecture is ~1.10--1.38x faster than 16-core Intel Xeon processor E5 family-based systems, depending on the problem size, while the 32-core Intel Xeon processor E5 is the fastest among all analyzed platforms.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116215232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Optimizing IBM algorithmics' mark-to-future aggregation engine for real-time counterparty credit risk scoring 优化IBM算法的实时交易对手信用风险评分的“面向未来的标记”聚合引擎
High Performance Computational Finance Pub Date : 2013-11-18 DOI: 10.1145/2535557.2535567
Amy Wang, Jan Treibig, Bob Blainey, Peng Wu, Yaoqing Gao, Barnaby Dalton, D. Gupta, Fahham Khan, Neil Bartlett, Lior Velichover, James Sedgwick, Louis Ly
{"title":"Optimizing IBM algorithmics' mark-to-future aggregation engine for real-time counterparty credit risk scoring","authors":"Amy Wang, Jan Treibig, Bob Blainey, Peng Wu, Yaoqing Gao, Barnaby Dalton, D. Gupta, Fahham Khan, Neil Bartlett, Lior Velichover, James Sedgwick, Louis Ly","doi":"10.1145/2535557.2535567","DOIUrl":"https://doi.org/10.1145/2535557.2535567","url":null,"abstract":"The concept of default and its associated painful repercussions have been a particular area of focus for financial institutions, especially after the 2007/2008 global financial crisis. Counterparty credit risk (CCR), i.e. risk associated with a counterparty default prior to the expiration of a contract, has gained tremendous amount of attention which resulted in new CCR measures and regulations being introduced. In particular users would like to measure the potential impact of each real time trade or potential real time trade against exposure limits for the counterparty using Monte Carlo simulations of the trade value, and also calculate the Credit Value Adjustment (i.e, how much it will cost to cover the risk of default with this particular counterparty if/when the trade is made). These rapid limit checks and CVA calculations demand more compute power from the hardware. Furthermore, with the emergence of electronic trading, the extreme low latency and high throughput real time compute requirement push both the software and hardware capabilities to the limit. Our work focuses on optimizing the computation of risk measures and trade processing in the existing Mark-to-future Aggregation (MAG) engine in the IBM Algorithmics product offering. We propose a new software approach to speed up the end-to-end trade processing based on a pre-compiled approach. The net result is an impressive speed up of 3--5x over the existing MAG engine using a real client workload, for processing trades which perform limit check and CVA reporting on exposures while taking full collateral modelling into account.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127915949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pricing American options with least squares Monte Carlo on GPUs 基于gpu的最小二乘蒙特卡罗美式期权定价
High Performance Computational Finance Pub Date : 2013-11-18 DOI: 10.1145/2535557.2535564
M. Fatica, E. Phillips
{"title":"Pricing American options with least squares Monte Carlo on GPUs","authors":"M. Fatica, E. Phillips","doi":"10.1145/2535557.2535564","DOIUrl":"https://doi.org/10.1145/2535557.2535564","url":null,"abstract":"This paper presents an implementation of the Least Squares Monte Carlo (LSMC) method by Longstaff and Schwartz [1] to price American options on GPU using CUDA. We focused our attention to the calibration phase and performed several experiments to assess the quality of the results. The implementation can price a put option with 200,000 paths and 50 time steps in less than 10 ms on a Tesla K20X.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114695040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
System architecture for on-line optimization of automated trading strategies 用于自动交易策略在线优化的系统架构
High Performance Computational Finance Pub Date : 2013-11-18 DOI: 10.1145/2535557.2535563
Fábio Daros Freitas, C. D. Freitas, A. D. Souza
{"title":"System architecture for on-line optimization of automated trading strategies","authors":"Fábio Daros Freitas, C. D. Freitas, A. D. Souza","doi":"10.1145/2535557.2535563","DOIUrl":"https://doi.org/10.1145/2535557.2535563","url":null,"abstract":"This work proposes a new automated trading system (ATS) architecture that supports multiple strategies for multiple market conditions through hierarchical trading signals generation employing h-signals, which are trading signals that are generated using other trading signals. The central idea of the proposed system architecture is to decompose the trading problem into a set of tasks handled by distributed autonomous agents under a minimal central coordination. We implemented the proposed ATS using a software architecture that employed a publish/subscribe communication model. In the current stage of development, we are able to run our ATS in back-test mode with moving-average crossover strategies on minute-by-minute market databases. We achieved very satisfactory performance results, processing 306.791 database rows representing more than two years of data in only 47 seconds.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131634026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Many-core architectures boost the pricing of basket options on adaptive sparse grids 多核架构提高了自适应稀疏网格上篮子期权的定价
High Performance Computational Finance Pub Date : 2013-11-18 DOI: 10.1145/2535557.2535560
A. Heinecke, J. Jepsen, H. Bungartz
{"title":"Many-core architectures boost the pricing of basket options on adaptive sparse grids","authors":"A. Heinecke, J. Jepsen, H. Bungartz","doi":"10.1145/2535557.2535560","DOIUrl":"https://doi.org/10.1145/2535557.2535560","url":null,"abstract":"In this work, we present a highly scalable approach for numerically solving the Black-Scholes PDE in order to price basket options. Our method is based on a spatially adaptive sparse-grid discretization with finite elements. Since we cannot unleash the compute capabilities of modern many-core chips such as GPUs using the complexity-optimal Up-Down method, we implemented an embarrassingly parallel direct method. This operator is paired with a distributed memory parallelization using MPI and we achieved very good scalability results compared to the standard Up-Down approach. Since we exploit all levels of the operator's parallelism, we are able to achieve nearly perfect strong scaling for the Black-Scholes solver. Our results show that typical problem sizes (5 dimensional basket options), require at least 4 NVIDIA K20X Kepler GPUs (inside a Cray XK7) in order to be faster than the Up-Down scheme running on 16 Intel Sandy Bridge cores (one box). On a Cray XK7 machine we outperform our highly parallel Up-Down implementation by 55X with respect to time to solution. Both results emphasize the competitiveness of our proposed operator.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116972025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Heterogeneous COS pricing of rainbow options 彩虹期权的异构COS定价
High Performance Computational Finance Pub Date : 2013-11-18 DOI: 10.1145/2535557.2535561
A. Cassagnes, Yu Chen, H. Ohashi
{"title":"Heterogeneous COS pricing of rainbow options","authors":"A. Cassagnes, Yu Chen, H. Ohashi","doi":"10.1145/2535557.2535561","DOIUrl":"https://doi.org/10.1145/2535557.2535561","url":null,"abstract":"This paper focuses on comparing different heterogeneous computational designs for the calculation of Rainbow options prices using the Fourier-cosine series expansion (COS) method. We also propose a simple enough way to automatically decide ratio of load balancing at runtime. A GPGPU implementation of the two-dimensional composite Simpson rule free of conditional statements with some degree of loop unrolling is also introduced. We will also show how to reduce the integration domain of coefficients appearing in the option pricing and by doing so, achieve a substantial speed-up and improve accuracy when compared versus a straightforward implementation.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128940311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accounting for secondary uncertainty: efficient computation of portfolio risk measures on multi and many core architectures 二次不确定性的核算:多核心体系结构上投资组合风险度量的有效计算
High Performance Computational Finance Pub Date : 2013-10-08 DOI: 10.1145/2535557.2535562
B. Varghese, A. Rau-Chaplin
{"title":"Accounting for secondary uncertainty: efficient computation of portfolio risk measures on multi and many core architectures","authors":"B. Varghese, A. Rau-Chaplin","doi":"10.1145/2535557.2535562","DOIUrl":"https://doi.org/10.1145/2535557.2535562","url":null,"abstract":"Aggregate Risk Analysis is a computationally intensive and a data intensive problem, thereby making the application of high-performance computing techniques interesting. In this paper, the design and implementation of a parallel Aggregate Risk Analysis algorithm on multi-core CPU and many-core GPU platforms are explored. The efficient computation of key risk measures, including Probable Maximum Loss (PML) and the Tail Value-at-Risk (TVaR) in the presence of both primary and secondary uncertainty for a portfolio of property catastrophe insurance treaties is considered. Primary Uncertainty is the the uncertainty associated with whether a catastrophe event occurs or not in a simulated year, while Secondary Uncertainty is the uncertainty in the amount of loss when the event occurs.\u0000 A number of statistical algorithms are investigated for computing secondary uncertainty. Numerous challenges such as loading large data onto hardware with limited memory and organising it are addressed. The results obtained from experimental studies are encouraging. Consider for example, an aggregate risk analysis involving 800,000 trials, with 1,000 catastrophic events per trial, a million locations, and a complex contract structure taking into account secondary uncertainty. The analysis can be performed in just 41 seconds on a GPU, that is 24x faster than the sequential counterpart on a fast multi-core CPU. The results indicate that GPUs can be used to efficiently accelerate aggregate risk analysis even in the presence of secondary uncertainty.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125697740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DSL programmable engine for high frequency trading acceleration DSL可编程引擎,用于高频交易加速
High Performance Computational Finance Pub Date : 2011-11-13 DOI: 10.1145/2088256.2088268
Heiner Litz, Christian Leber, Benjamin Geib
{"title":"DSL programmable engine for high frequency trading acceleration","authors":"Heiner Litz, Christian Leber, Benjamin Geib","doi":"10.1145/2088256.2088268","DOIUrl":"https://doi.org/10.1145/2088256.2088268","url":null,"abstract":"In High Frequency Trading systems, a large number of orders needs to be processed with minimal latency at very high data rates. We propose an FPGA based accelerator for High Frequency Trading that is able to decrease latency by an order of magnitude and increase the data rate by the same rate compared to software based CPU approaches. In particular, we focus on the acceleration of FAST, the most commonly used protocol for distributing pricing information of stock and options over the network. As FPGAs are hard to program, we present a novel Domain Specific Language that enables our engine to be programmed via software. The code is compiled by our own compiler into binary microcode that is then executed on a microcode engine. In this paper we provide detailed insights into our hardware structure and the optimizations we applied to increase the data rate and the overall processing performance.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124676235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Algorithmic complexity in the heston model: an implementation view 赫斯顿模型中的算法复杂性:一个实现视图
High Performance Computational Finance Pub Date : 2011-11-13 DOI: 10.1145/2088256.2088261
H. Marxen, A. Kostiuk, R. Korn, C. D. Schryver, S. Wurm, I. Shcherbakov, N. Wehn
{"title":"Algorithmic complexity in the heston model: an implementation view","authors":"H. Marxen, A. Kostiuk, R. Korn, C. D. Schryver, S. Wurm, I. Shcherbakov, N. Wehn","doi":"10.1145/2088256.2088261","DOIUrl":"https://doi.org/10.1145/2088256.2088261","url":null,"abstract":"In this paper, we present an in-depth investigation of the algorithmic parameter influence for barrier option pricing with the Heston model. For that purpose we focus on single- and multi-level Monte Carlo simulation methods. We investigate the impact of algorithmic variations on simulation time and energy consumption, giving detailed measurement results for a state-of-the-art 8-core CPU server and a Nvidia Tesla C2050 GPU. We particularly show that a naive algorithm on a powerful GPU can even increase the energy consumption and computation time, compared to a better algorithm running on a standard CPU. Furthermore we give preliminary results of a dedicated FPGA implementation and comment on the speedup and energy saving potential of this architecture.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121287780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Autotuning for high performance computing 用于高性能计算的自动调整
High Performance Computational Finance Pub Date : 2011-11-13 DOI: 10.1145/2088256.2088264
D. Padua
{"title":"Autotuning for high performance computing","authors":"D. Padua","doi":"10.1145/2088256.2088264","DOIUrl":"https://doi.org/10.1145/2088256.2088264","url":null,"abstract":"Program performance depends not only on the algorithms and data structures implemented in the program but also on coding parameters. These parameters include frequency and size of messages, shape of loop tiles, and minimum number of iterations required for parallel execution of a loop. Making the right selection of algorithms, data structures and coding parameters for a given target machine can be an onerous task in part because of the many machine parameters that must be taken into account and the interaction between these parameters. Important machine parameters include cache size, memory bandwidth, communication costs, and overhead. Furthermore, some of the selections must often be reassessed when porting to a different machine even when this machine does not differ significantly from the original target.\u0000 It is clearly advantageous to make use of tools and techniques that help reduce the initial effort of programming for performance as well as the cost of porting. The tool that comes first to mind is the compiler. Compilers were developed to enable machine independent programming and, to this end, apply powerful code generation and optimization strategies that take into account machine parameters. However, compilers not always suffice. They operate almost exclusively at the coding level and even at this low level they are not always effective. For example, compilers often fail to reorganize loops in the best manner when generating code for microprocessor vector extensions. Good use of these extensions today requires manual intervention.\u0000 Autotuning programs are those capable of generating one or several versions of a program. These versions could be derived from a parameterized program, or from descriptions at a higher level of abstraction that could take the form of algorithms or even problem specification. It is also desirable to take into account target machine parameters and the characteristics of the input data in the generation process.\u0000 When multiple versions are generated, one is selected at compile-time or at run time by carrying out an empirical search that executes the versions with representative data and measures program performance to guide the selection.\u0000 Autotuning programs can be written in conventional code such as Fortran, C, C++, or java, annotated with transformations that can be applied to the whole program or to code segments. Alternatively, autotuning programs can be written in a very high level declarative notation that represent algorithms or problems to be solved.\u0000 Although the initial cost of developing an autotuning program is higher than that of developing a conventional program, it has the advantage that much of the analysis required for the first target machine is done automatically and that it can be ported across machines and machine classes maintaining good performance.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128838540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信