Julian Oppermann, Mitchell B. Mickaliger, Oliver Sinnen
{"title":"Pulsar search acceleration using FPGAs and OpenCL templates","authors":"Julian Oppermann, Mitchell B. Mickaliger, Oliver Sinnen","doi":"10.1007/s10686-022-09888-z","DOIUrl":null,"url":null,"abstract":"<div><p>The Square Kilometre Array (SKA) is the world’s largest radio telescope currently under construction, and will employ elaborate signal processing to detect new pulsars, i.e. highly magnetised rotating neutron stars. This paper addresses the acceleration of demanding computations for this pulsar search on Field-Programmable Gate Arrays (FPGAs) using a new high-level design process based on OpenCL templates that is transferable to other scientific problems. The successful FPGA acceleration of large-scale scientific workloads requires custom architectures that fully exploit the parallel computing capabilities of modern reconfigurable hardware and are amenable to substantial design space exploration. OpenCL-based high-level synthesis toolchains, with their ability to express interconnected multi-kernel pipelines in a single source language, excel in this domain. However, the achievable performance strongly depends on how well the compiler can infer desirable hardware structures from the code. One key aspect to excellent performance is commonly the uninterrupted, high-bandwidth streaming of data into and through the design. This is difficult to achieve in complex designs when data order needs to be re-arranged, e.g. transposed. It is equally hard to pre-fetch and burst-load from DDR memory when reading occurs in non-trivial patterns. In this paper, we propose new approaches to these two problems that use OpenCL-based code templates.</p><p>We demonstrate the practical benefits of these approaches with the acceleration of a key component in the SKA’s pulsar search pipeline: the Fourier Domain Acceleration Search (FDAS) module. Using our proposed methodology, we are able to develop a more scalable FDAS accelerator architecture than previously possible. We explore its design space to eventually achieve a 10x throughput improvement over a prior, thoroughly optimised implementation in plain OpenCL.</p></div>","PeriodicalId":551,"journal":{"name":"Experimental Astronomy","volume":"56 1","pages":"239 - 266"},"PeriodicalIF":2.7000,"publicationDate":"2023-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10686-022-09888-z.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Experimental Astronomy","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/s10686-022-09888-z","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 1
Abstract
The Square Kilometre Array (SKA) is the world’s largest radio telescope currently under construction, and will employ elaborate signal processing to detect new pulsars, i.e. highly magnetised rotating neutron stars. This paper addresses the acceleration of demanding computations for this pulsar search on Field-Programmable Gate Arrays (FPGAs) using a new high-level design process based on OpenCL templates that is transferable to other scientific problems. The successful FPGA acceleration of large-scale scientific workloads requires custom architectures that fully exploit the parallel computing capabilities of modern reconfigurable hardware and are amenable to substantial design space exploration. OpenCL-based high-level synthesis toolchains, with their ability to express interconnected multi-kernel pipelines in a single source language, excel in this domain. However, the achievable performance strongly depends on how well the compiler can infer desirable hardware structures from the code. One key aspect to excellent performance is commonly the uninterrupted, high-bandwidth streaming of data into and through the design. This is difficult to achieve in complex designs when data order needs to be re-arranged, e.g. transposed. It is equally hard to pre-fetch and burst-load from DDR memory when reading occurs in non-trivial patterns. In this paper, we propose new approaches to these two problems that use OpenCL-based code templates.
We demonstrate the practical benefits of these approaches with the acceleration of a key component in the SKA’s pulsar search pipeline: the Fourier Domain Acceleration Search (FDAS) module. Using our proposed methodology, we are able to develop a more scalable FDAS accelerator architecture than previously possible. We explore its design space to eventually achieve a 10x throughput improvement over a prior, thoroughly optimised implementation in plain OpenCL.
期刊介绍:
Many new instruments for observing astronomical objects at a variety of wavelengths have been and are continually being developed. Furthermore, a vast amount of effort is being put into the development of new techniques for data analysis in order to cope with great streams of data collected by these instruments.
Experimental Astronomy acts as a medium for the publication of papers of contemporary scientific interest on astrophysical instrumentation and methods necessary for the conduct of astronomy at all wavelength fields.
Experimental Astronomy publishes full-length articles, research letters and reviews on developments in detection techniques, instruments, and data analysis and image processing techniques. Occasional special issues are published, giving an in-depth presentation of the instrumentation and/or analysis connected with specific projects, such as satellite experiments or ground-based telescopes, or of specialized techniques.