PARMA-DITAM@HiPEAC最新文献

筛选
英文 中文
Multithread Accelerators on FPGAs: A Dataflow-Based Approach fpga上的多线程加速器:基于数据流的方法
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2022.6
Francesco Ratto, Stefano Esposito, Carlo Sau, L. Raffo, F. Palumbo
{"title":"Multithread Accelerators on FPGAs: A Dataflow-Based Approach","authors":"Francesco Ratto, Stefano Esposito, Carlo Sau, L. Raffo, F. Palumbo","doi":"10.4230/OASIcs.PARMA-DITAM.2022.6","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.6","url":null,"abstract":"Multithreading is a well-known technique for general-purpose systems to deliver a substantial performance gain, raising resource efficiency by exploiting underutilization periods. With the increase of specialized hardware, resource efficiency became fundamental to master the introduced overhead of such kind of devices. In this work, we propose a model-based approach for designing specialized multithread hardware accelerators. This novel approach exploits dataflow models of applications and tagged tokens to let the resulting hardware support concurrent threads without the need to replicate the whole accelerator. Assessment is carried out over different versions of an accelerator for a compute-intensive step of modern video coding algorithms, under several feeding configurations. Results highlight that the proposed multithread accelerators achieve a valuable tradeoff: saving computational resources with respect to replicated parallel single-thread accelerators, while guaranteeing shorter waiting, response, and elaboration time than a unique single-thread accelerator multiplexed in time. 2012 ACM","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124483652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
HPC Application Cloudification: The StreamFlow Toolkit (Invited Paper) 高性能计算应用云化:StreamFlow工具包(特邀论文)
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2021.5
Iacopo Colonnelli, B. Cantalupo, Roberto Esposito, M. Pennisi, C. Spampinato, Marco Aldinucci
{"title":"HPC Application Cloudification: The StreamFlow Toolkit (Invited Paper)","authors":"Iacopo Colonnelli, B. Cantalupo, Roberto Esposito, M. Pennisi, C. Spampinato, Marco Aldinucci","doi":"10.4230/OASIcs.PARMA-DITAM.2021.5","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2021.5","url":null,"abstract":"","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123032411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Just-In-Time Composition of Reconfigurable Overlays (Invited Talk) 可重构叠加的即时合成(特邀演讲)
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2022.2
Rafael Zamacola, A. Otero, Alfonso Rodríguez, E. D. L. Torre
{"title":"Just-In-Time Composition of Reconfigurable Overlays (Invited Talk)","authors":"Rafael Zamacola, A. Otero, Alfonso Rodríguez, E. D. L. Torre","doi":"10.4230/OASIcs.PARMA-DITAM.2022.2","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.2","url":null,"abstract":"","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"133 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134162098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs BifurKTM: gpu的近似一致分布式事务内存
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2021.2
Samuel Irving, Lu Peng, C. Busch, J. Peir
{"title":"BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs","authors":"Samuel Irving, Lu Peng, C. Busch, J. Peir","doi":"10.4230/OASIcs.PARMA-DITAM.2021.2","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2021.2","url":null,"abstract":"We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU clusters. The BifurKTM design includes: GPU KoSTM, a new software transactional memory conflict detection scheme that exploits relaxed consistency to increase throughput; and KoDTM, a Distributed Transactional Memory model that combines the Dataand Controlflow models to greatly reduce communication overheads. Despite the allure of huge speedups, GPUs are limited in use due to their programmability and extreme sensitivity to workload characteristics. These become daunting concerns when considering a distributed GPU cluster, wherein a programmer must design algorithms to hide communication latency by exploiting data regularity, high compute intensity, etc. The BifurKTM design allows GPU programmers to exploit a new workload characteristic: the percentage of the workload that is Read-Only (e.g. reads but does not modify shared memory), even when this percentage is not known in advance. Programmers designate transactions that are suitable for Approximate Consistency, in which transactions “appear” to execute at the most convenient time for preventing conflicts. By leveraging Approximate Consistency for Read-Only transactions, the BifurKTM runtime system offers improved performance, application flexibility, and programmability without introducing any errors into shared memory. Our experiments show that Approximate Consistency can improve BkTM performance by up to 34x in applications with moderate network communication utilization and a read-intensive workload. Using Approximate Consistency, BkTM can reduce GPU-to-GPU network communication by 99%, reduce the number of aborts by up to 100%, and achieve an average speedup of 18x over a similarly sized CPU cluster while requiring minimal effort from the programmer. 2012 ACM Subject Classification Computer systems organization → Heterogeneous (hybrid) systems","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"430 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115932865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Adaptive Multi-Alternative Process Network 面向自适应多备选过程网络
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2021.1
Hasna Bouraoui, Chadlia Jerad, J. Castrillón
{"title":"Towards Adaptive Multi-Alternative Process Network","authors":"Hasna Bouraoui, Chadlia Jerad, J. Castrillón","doi":"10.4230/OASIcs.PARMA-DITAM.2021.1","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2021.1","url":null,"abstract":"With the increase of voice-controlled systems, speech based recognition applications are gaining more attention. Such applications need to adapt to hardware platforms to offer the required performance. Given the streaming nature of these applications, dataflow models are a common choice for modelbased design and execution on parallel embedded platforms. However, most of today’s models are built on top of classical static dataflow with adaptivity extensions to express data parallelism. In this paper, we define and describe an approach for algorithmic adaptivity to express richer sets of variants and trade-offs. For this, we introduce multi-Alternative Process Network (mAPN), a high-level abstract representation where several process networks of the same application coexist. We describe an algorithm for automatic generation of all possible alternatives. The mAPN is enriched with meta-data serving to endow the alternatives with annotations in terms of a specific metric, helping to extract the most suitable alternative depending on the available computational resources and application/user constraints. We motivate the approach by the automatic subtitling application (ASA) as use case and run the experiments on an mAPN sample consisting of 12 randomly selected possible variants. 2012 ACM Subject Classification Theory of computation → Streaming models","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129854072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Energy-Aware HEVC Software Decoding On Mobile Heterogeneous Multi-Cores Architectures 基于移动异构多核架构的能量感知HEVC软件解码
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2022.4
Mohammed Bey Ahmed Khernache, Jalil Boukhobza, Yahia Benmoussa, D. Ménard
{"title":"Energy-Aware HEVC Software Decoding On Mobile Heterogeneous Multi-Cores Architectures","authors":"Mohammed Bey Ahmed Khernache, Jalil Boukhobza, Yahia Benmoussa, D. Ménard","doi":"10.4230/OASIcs.PARMA-DITAM.2022.4","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.4","url":null,"abstract":"Video content is becoming increasingly omnipresent on mobile platforms thanks to advances in mobile heterogeneous architectures. These platforms typically include limited rechargeable batteries which do not improve as fast as video content. Most state-of-the-art studies proposed solutions based on parallelism to exploit the GPP heterogeneity and DVFS to scale up/down the GPP frequency based on the video workload. However, some studies assume to have information about the workload before to start decoding. Others do not exploit the asymmetry character of recent mobile architectures. To address these two challenges, we propose a solution based on classification and frequency scaling. First, a model to classify frames based on their type and size is built during design-time. Second, this model is applied for each frame to decide which GPP cores will decode it. Third, the frequency of the chosen GPP cores is dynamically adjusted based on the output buffer size. Experiments on real-world mobile platforms show that the proposed solution can save more than 20% of energy (mJ/Frame) compared to the Ondemand Linux governor with less than 5% of miss-rate. Moreover, it needs less than one second of decoding to enter the stable state and the overhead represents less than 1% of the frame decoding time.","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"216 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134031461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Memory Management for Modelica Simulations Modelica模拟的高效内存管理
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2022.7
Michele Scuttari, Nicola Camillucci, Daniele Cattaneo, F. Terraneo, G. Agosta
{"title":"Efficient Memory Management for Modelica Simulations","authors":"Michele Scuttari, Nicola Camillucci, Daniele Cattaneo, F. Terraneo, G. Agosta","doi":"10.4230/OASIcs.PARMA-DITAM.2022.7","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.7","url":null,"abstract":"The ever increasing usage of simulations in order to produce digital twins of physical systems led to the creation of specialized equation-based modeling languages such as Modelica. However, compilers of such languages often generate code that exploits the garbage collection memory management paradigm, which introduces significant runtime overhead. In this paper we explain how to improve the memory management approach of the automatically generated simulation code. This is achieved by addressing two different aspects. One regards the reduction of the heap memory usage, which is obtained by modifying functions whose resulting arrays could instead be allocated on the stack by the caller. The other aspect regards the possibility of avoiding garbage collection altogether by performing all memory lifetime tracking statically. We implement our approach in a prototype Modelica compiler, achieving an improvement of the memory management overhead of over 10 times compared to a garbage collected solution, and an improvement of 56 times compared to the production-grade compiler OpenModelica. 2012 ACM Subject Classification Software and its engineering → Compilers; Computing methodo-logies → and","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127774580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precision Tuning in Parallel Applications 并行应用程序中的精密调谐
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2022.5
Gabriele Magnani, Lev Denisov, Daniele Cattaneo, G. Agosta
{"title":"Precision Tuning in Parallel Applications","authors":"Gabriele Magnani, Lev Denisov, Daniele Cattaneo, G. Agosta","doi":"10.4230/OASIcs.PARMA-DITAM.2022.5","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.5","url":null,"abstract":"Nowadays, parallel applications are used every day in high performance computing, scientific computing and also in everyday tasks due to the pervasiveness of multi-core architectures. However, several implementation challenges have so far stifled the integration of parallel applications and automatic precision tuning. First of all, tuning a parallel application introduces difficulties in the detection of the region of code that must be affected by the optimization. Moreover, additional challenges arise in handling shared variables and accumulators. In this work we address such challenges by introducing OpenMP parallel programming support to the TAFFO precision tuning framework. With our approach we achieve speedups up to 750% with respect to the same parallel application without precision tuning. 2012 ACM Subject Classification Software and its engineering → Compilers; Theory of computation → Parallel computing models","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131402423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SO(DA)2: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk) SO(DA)2:专用可重构架构的端到端生成(特邀演讲)
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2022.1
Antonino Tumeo, Nicolas Bohm Agostini, S. Curzel, Ankur Limaye, Cheng Tan, Vinay C. Amatya, Marco Minutoli, Vito Giovanni Castellana, Ang Li, J. Manzano
{"title":"SO(DA)2: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk)","authors":"Antonino Tumeo, Nicolas Bohm Agostini, S. Curzel, Ankur Limaye, Cheng Tan, Vinay C. Amatya, Marco Minutoli, Vito Giovanni Castellana, Ang Li, J. Manzano","doi":"10.4230/OASIcs.PARMA-DITAM.2022.1","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.1","url":null,"abstract":"Modern data analysis applications are complex workflows composed of algorithms with diverse behaviors. They may include digital signal processing, data filtering, reduction, compression, graph algorithms, and machine learning. Their performance is highly dependent on the volume, the velocity, and the structure of the data. They are used in many different domains (from small, embedded devices, to large-scale, high-performance computing systems) but in all cases they need to provide answers with very low latency to enable real-time decision making and autonomy. Coarse-grained reconfigurable arrays (CGRAs), i.e., architectures composed of functional units able to perform complex operations interconnected through a network-on-chip and configure the datapath to map complex kernels, are a promising platform to accelerate these applications thanks to their adaptability. They provide higher flexibility than application-specific integrated circuits (ASICs) while offering increased energy efficiency and faster reconfiguration speed with respect to field-programmable gate arrays (FPGAs). However, designing and specializing CGRAs requires significant efforts. The inherent flexibility of these devices makes the application mapping process equally important to the hardware design generation. To obtain efficient systems, approaches that simultaneously considers software and hardware optimizations are necessary. In this paper, we discuss the Software Defined Architectures for Data Analytics (SO(DA) 2 ) toolchain, an end-to-end hardware/software codesign framework to generate custom reconfigurable architectures for data analytics applications. (SO(DA) 2 ) is composed of a high-level compiler (SODA-OPT) and a hardware generator (OpenCGRA) and can automatically explore and generate optimal CGRA designs starting from high-level programming frameworks. SO(DA) 2 considers partial dynamic reconfiguration as key element of the system design. We discuss the various elements of the framework and demonstrate the flow on the case study of a partial dynamic reconfigurable CGRA design for data streaming applications. Acknowledgements The research described in this paper is part of the Data-Model Convergence (DMC) Initiative at Pacific Northwest National Laboratory. It was conducted under the Laboratory Directed Research and Development Program at PNNL, a multiprogram national laboratory operated by Battelle for the U.S. Department of Energy.","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Impact of Precision Tuning on Embedded Systems Performance: A Case Study on Field-Oriented Control 精确调谐对嵌入式系统性能的影响:以面向场的控制为例
PARMA-DITAM@HiPEAC Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.PARMA-DITAM.2021.3
Gabriele Magnani, Daniele Cattaneo, M. Chiari, G. Agosta
{"title":"The Impact of Precision Tuning on Embedded Systems Performance: A Case Study on Field-Oriented Control","authors":"Gabriele Magnani, Daniele Cattaneo, M. Chiari, G. Agosta","doi":"10.4230/OASIcs.PARMA-DITAM.2021.3","DOIUrl":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2021.3","url":null,"abstract":"Field Oriented Control (FOC) is an industry-standard strategy for controlling induction motors and other kinds of AC-based motors. This control scheme has a very high arithmetic intensity when implemented digitally – in particular it requires the use of trigonometric functions. This requirement contrasts with the necessity of increasing the control step frequency when required, and the minimization of power consumption in applications where conserving battery life is paramount such as drones. However, it also makes FOC well suited for optimization using precision tuning techniques. Therefore, we exploit the state-of-the-art FixM methodology to optimize a miniapp simulating a typical FOC application by applying precision tuning of trigonometric functions. The FixM approach itself was extended in order to implement additional algorithm choices to enable a trade-off between execution time and code size. With the application of FixM on the miniapp, we achieved a speedup up to 278%, at a cost of an error in the output less than 0.1%. 2012 ACM Subject Classification Hardware → Power estimation and optimization; Software and its engineering → Compilers; Applied computing → Consumer health","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128292128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信