2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)最新文献

筛选
英文 中文
BBS: Micro-Architecture Benchmarking Blockchain Systems through Machine Learning and Fuzzy Set 论坛:通过机器学习和模糊集对区块链系统进行微架构基准测试
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00041
Liang Zhu, Chao Chen, Zihao Su, Weiguang Chen, Tao Li, Zhibin Yu
{"title":"BBS: Micro-Architecture Benchmarking Blockchain Systems through Machine Learning and Fuzzy Set","authors":"Liang Zhu, Chao Chen, Zihao Su, Weiguang Chen, Tao Li, Zhibin Yu","doi":"10.1109/HPCA47549.2020.00041","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00041","url":null,"abstract":"Due to the decentralization, irreversibility, and traceability, blockchain has attracted significant attention and has been deployed in many critical industries such as banking and logistics. However, the micro-architecture characteristics of blockchain programs still remain unclear. What's worse, the large number of micro-architecture events make understanding the characteristics extremely difficult. We even lack a systematic approach to identify the important events to focus on. In this paper, we propose a novel benchmarking methodology dubbed BBS to characterize blockchain programs at micro-architecture level. The key is to leverage fuzzy set theory to identify important micro-architecture events after the significance of them is quantified by a machine learning based approach. The important events for single programs are employed to characterize the programs while the common important events for multiple programs form an importance vector which is used to measure the similarity between benchmarks. We leverage BBS to characterize seven and six benchmarks from Blockbench and Caliper, respectively. The results show that BBS can reveal interesting findings. Moreover, by leveraging the importance characterization results, we improve that the transaction throughput of Smallbank from Fabric by 70% while reduce the transaction latency by 55%. In addition, we find that three of seven and two of six benchmarks from Blockbench and Caliper are redundant, respectively.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129628799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
EMSim: A Microarchitecture-Level Simulation Tool for Modeling Electromagnetic Side-Channel Signals EMSim:一种微架构级仿真工具,用于电磁侧信道信号的建模
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00016
Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović
{"title":"EMSim: A Microarchitecture-Level Simulation Tool for Modeling Electromagnetic Side-Channel Signals","authors":"Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović","doi":"10.1109/HPCA47549.2020.00016","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00016","url":null,"abstract":"Side-channel attacks have become a serious security concern for computing systems, especially for embedded devices, where the device is often located in, or in proximity to, a public place, and yet the system contains sensitive information. To design systems that are highly resilient to such attacks, an accurate and efficient design-stage quantitative analysis of side-channel leakage is needed. For many systems properties (e.g., performance, power, etc.), cycle-accurate simulation can provide such an efficient-yet-accurate design-stage estimate. Unfortunately, for an important class of side-channels, electromagnetic emanations, such a model does not exist, and there has not even been much quantitative evidence about what level of modeling detail (e.g., hardware, microarchitecture, etc.) would be needed for high accuracy. This paper presents EMSim, an approach that enables simulation of the electromagnetic (EM) side-channel signals cycle-by-cycle using a detailed micro-architectural model of the device. To evaluate EMSim, we compare its signals against actual EM signals emanated from real hardware (FPGA-based RISC-V processor), and find that they match very closely. To gain further insights, we also experimentally identify how the accuracy of the simulation degrades when key microarchitectural features (e.g., pipeline stall, cache-miss, etc.) and other hardware behaviors (e.g., data-dependent switching activity) are omitted from the simulation model. We further evaluate how robust the simulation-based results are, by comparing them to real signals collected in different conditions (manufacturing, distance, etc.). Finally, to show the applicability of EMSim, we demonstrate how it can be used to measure side-channel leakage through simulation at design-stage.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129753659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators 支点:灵活实用的原位加速器的简化控制和存取机制
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00052
Marzieh Lenjani, Patricia Gonzalez-Guerrero, Elaheh Sadredini, Shuangchen Li, Yuan Xie, Ameen Akel, S. Eilert, M. Stan, K. Skadron
{"title":"Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators","authors":"Marzieh Lenjani, Patricia Gonzalez-Guerrero, Elaheh Sadredini, Shuangchen Li, Yuan Xie, Ameen Akel, S. Eilert, M. Stan, K. Skadron","doi":"10.1109/HPCA47549.2020.00052","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00052","url":null,"abstract":"In-situ approaches process data very close to the memory cells, in the row buffer of each subarray. This minimizes data movement costs and affords parallelism across subarrays. However, current in-situ approaches are limited to only row-wide bitwise (or few-bit) operations applied uniformly across the row buffer. They impose a significant overhead of multiple row activations for emulating 32-bit addition and multiplications using bitwise operations and cannot support operations with data dependencies or based on predicates. Moreover, with current peripheral logic, communication among subarrays is inefficient, and with typical data layouts, bits in a word are not physically adjacent. The key insight of this work is that in-situ, single-word ALUs outperform in-situ, parallel, row-wide, bitwise ALUs by reducing the number of row activations and enabling new operations and optimizations. Our proposed lightweight access and control mechanism, Fulcrum, sequentially feeds data into the single-word ALU and enables operations with data dependencies and operations based on a predicate. For algorithms that require communication among subarrays, we augment the peripheral logic with broadcasting capabilities and a previously-proposed method for low-cost inter-subarray data movement. The sequential processor also enables overlapping of broadcasting and computation, and reuniting bits that are physically adjacent. In order to realize true subarray-level parallelism, we introduce a lightweight column-selection mechanism through shifting one-hot encoded values. This technique enables independent column selection in each subarray. We integrate Fulcrum with Compress Express Link (CXL), a new interconnect standard. Fulcrum with one memory stack delivers on average (up to) 23.4 (76) speedup over a server-class GPU, NVIDIA P100, with three stacks of HBM2 memory, (ii) 70 (228) times speedup per memory stack over the GPU, and (iii) 19 (178.9) times speedup per memory stack over an ideal model of the GPU, which only accounts for the overhead of data movement.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134200248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A New Side-Channel Vulnerability on Modern Computers by Exploiting Electromagnetic Emanations from the Power Management Unit 利用电源管理单元电磁辐射的现代计算机新侧信道漏洞
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00020
Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović
{"title":"A New Side-Channel Vulnerability on Modern Computers by Exploiting Electromagnetic Emanations from the Power Management Unit","authors":"Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović","doi":"10.1109/HPCA47549.2020.00020","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00020","url":null,"abstract":"This paper presents a new micro-architectural vulnerability on the power management units of modern computers which creates an electromagnetic-based side-channel. The key observations that enable us to discover this sidechannel are: 1) in an effort to manage and minimize power consumption, modern microprocessors have a number of possible operating modes (power states) in which various sub-systems of the processor are powered down, 2) for some of the transitions between power states, the processor also changes the operating mode of the voltage regulator module (VRM) that supplies power to the affected sub-system, and 3) the electromagnetic (EM) emanations from the VRM are heavily dependent on its operating mode. As a result, these state-dependent EM emanations create a side-channel which can potentially reveal sensitive information about the current state of the processor and, more importantly, the programs currently being executed. To demonstrate the feasibility of exploiting this vulnerability, we create a covert channel by utilizing the changes in the processor's power states. We show how such a covert channel can be leveraged to exfiltrate sensitive information from a secured and completely isolated (air-gapped) laptop system by placing a compact, inexpensive receiver in proximity to that system. To further show the severity of this attack, we also demonstrate how such a covert channel can be established when the target and the receiver are several meters away from each other, including scenarios where the receiver and the target are separated by a wall. Compared to the state-of-the-art, the proposed covert channel has >3x higher bit-rate. Finally, to demonstrate that this new vulnerability is not limited to being used as a covert channel, we demonstrate how it can be used for attacks such as keystroke logging.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133301750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
DRAM-Less: Hardware Acceleration of Data Processing with New Memory 无内存:新内存数据处理的硬件加速
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00032
Jie Zhang, Gyuyoung Park, D. Donofrio, J. Shalf, Myoungsoo Jung
{"title":"DRAM-Less: Hardware Acceleration of Data Processing with New Memory","authors":"Jie Zhang, Gyuyoung Park, D. Donofrio, J. Shalf, Myoungsoo Jung","doi":"10.1109/HPCA47549.2020.00032","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00032","url":null,"abstract":"General purpose hardware accelerators have become major data processing resources in many computing domains. However, the processing capability of hardware accelerations is often limited by costly software interventions and memory copies to support compulsory data movement between different processors and solid-state drives (SSDs). This in turn also wastes a significant amount of energy in modern accelerated systems. In this work, we propose, DRAM-less, a hardware automation approach that precisely integrates many state-of-the-art phase change memory (PRAM) modules into its data processing network to dramatically reduce unnecessary data copies with a minimum of software modifications. We implement a new memory controller that plugs a real 3x nm multi-partition PRAM to 28nm technology FPGA logic cells and interoperate its design into a real PCIe accelerator emulation platform. The evaluation results reveal that our DRAM-less achieves, on average, 47% better performance than advanced acceleration approaches that use a peer-to-peer DMA.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124011422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices 降低移动设备连接待机能耗的技术
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00057
Jawad Haj-Yahya, Yiannakis Sazeides, M. Alser, Efraim Rotem, O. Mutlu
{"title":"Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices","authors":"Jawad Haj-Yahya, Yiannakis Sazeides, M. Alser, Efraim Rotem, O. Mutlu","doi":"10.1109/HPCA47549.2020.00057","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00057","url":null,"abstract":"Modern mobile devices, such as smartphones, tablets, and laptops, are idle most of the time but they remain connected to communication channels even when idle. This operation mode is called connected-standby. To increase battery life in the connected-standby mode, a mobile device enters the deepest-runtime-idle-power state (DRIPS), which minimizes power consumption and retains fast wake-up capability. In this work, we identify three sources of energy inefficiency in modern DRIPS designs and introduce three techniques to reduce the power consumption of mobile devices in connected-standby. To our knowledge, this is the first work to explicitly focus on and improve the connected-standby power management of high-performance mobile devices, with evaluations on a real system. We propose the optimized-deepest-runtime-idle-power state (ODRIPS), a mechanism that dynamically: 1) offloads the monitoring of wake-up events to low-power off-chip circuitry, which enables turning off all of the processor's clock sources, 2) offloads all of the processor's input/output functionality off-chip and power-gates the corresponding on-chip input/output functions, and 3) transfers the processor's context to a secure memory region inside DRAM, which eliminates the need to store the context using on-chip high-leakage SRAMs, thereby reducing leakage power. We implement ODRIPS in Intel's Skylake client processor and its associated Sunrise-Point chipset. An analysis of ODRIPS on a real system reveals that it reduces the platform average power consumption in connected-standby mode by 22%. We also identify an opportunity to further reduce ODRIPS power by using emerging low-power non-volatile memory (instead of DRAM) to store the processor context.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128540727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Experiences with ML-Driven Design: A NoC Case Study 机器学习驱动设计的经验:一个NoC案例研究
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00058
Jieming Yin, Subhash Sethumurugan, Yasuko Eckert, Chintan Patel, Alan Smith, Eric Morton, M. Oskin, Natalie D. Enright Jerger, G. Loh
{"title":"Experiences with ML-Driven Design: A NoC Case Study","authors":"Jieming Yin, Subhash Sethumurugan, Yasuko Eckert, Chintan Patel, Alan Smith, Eric Morton, M. Oskin, Natalie D. Enright Jerger, G. Loh","doi":"10.1109/HPCA47549.2020.00058","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00058","url":null,"abstract":"There has been a lot of recent interest in applying machine learning (ML) to the design of systems, which purports to aid human experts in extracting new insights leading to better systems. In this work, we share our experiences with applying ML to improve one aspect of networks-on-chips (NoC) to uncover new ideas and approaches, which eventually led us to a new arbitration scheme that is effective for NoCs under heavy contention. However, a significant amount of human effort and creativity was still needed to optimize just one aspect (arbitration) of what is only one component (the NoC) of the overall processor. This leads us to conclude that much work (and opportunity!) remains to be done in the area of ML-driven architecture design.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127887965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Domain-Specialized Cache Management for Graph Analytics 图形分析领域专用缓存管理
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-01-22 DOI: 10.1109/HPCA47549.2020.00028
P. Faldu, Jeff Diamond, Boris Grot
{"title":"Domain-Specialized Cache Management for Graph Analytics","authors":"P. Faldu, Jeff Diamond, Boris Grot","doi":"10.1109/HPCA47549.2020.00028","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00028","url":null,"abstract":"Graph analytics power a range of applications in areas as diverse as finance, networking and business logistics. A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections in the graph. These richly-connected, hot, vertices inherently exhibit high reuse. However, this work finds that state-of-the-art hardware cache management schemes struggle in capitalizing on their reuse due to highly irregular access patterns of graph analytics. In response, we propose GRASP, domain-specialized cache management at the last-level cache for graph analytics. GRASP augments existing cache policies to maximize reuse of hot vertices by protecting them against cache thrashing, while maintaining sufficient flexibility to capture the reuse of other vertices as needed. GRASP keeps hardware cost negligible by leveraging lightweight software support to pinpoint hot vertices, thus eliding the need for storage-intensive prediction mechanisms employed by state-of-the-art cache management schemes. On a set of diverse graph-analytic applications with large high-skew graph datasets, GRASP outperforms prior domain-agnostic schemes on all datapoints, yielding an average speed-up of 4.2% (max 9.4%) over the best-performing prior scheme. GRASP remains robust on low-/no-skew datasets, whereas prior schemes consistently cause a slowdown.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123381180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
HyGCN: A GCN Accelerator with Hybrid Architecture HyGCN:混合架构的GCN加速器
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-01-07 DOI: 10.1109/HPCA47549.2020.00012
Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, Yuan Xie
{"title":"HyGCN: A GCN Accelerator with Hybrid Architecture","authors":"Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, Yuan Xie","doi":"10.1109/HPCA47549.2020.00012","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00012","url":null,"abstract":"Inspired by the great success of neural networks, graph convolutional neural networks (GCNs) are proposed to analyze graph data. GCNs mainly include two phases with distinct execution patterns. The Aggregation phase, behaves as graph processing, showing a dynamic and irregular execution pattern. The Combination phase, acts more like the neural networks, presenting a static and regular execution pattern. The hybrid execution patterns of GCNs require a design that alleviates irregularity and exploits regularity. Moreover, to achieve higher performance and energy efficiency, the design needs to leverage the high intra-vertex parallelism in Aggregation phase, the highly reusable inter-vertex data in Combination phase, and the opportunity to fuse phase-by-phase execution introduced by the new features of GCNs. However, existing architectures fail to address these demands. In this work, we first characterize the hybrid execution patterns of GCNs on Intel Xeon CPU. Guided by the characterization, we design a GCN accelerator, HyGCN, using a hybrid architecture to efficiently perform GCNs. Specifically, first, we build a new programming model to exploit the fine-grained parallelism for our hardware design. Second, we propose a hardware design with two efficient processing engines to alleviate the irregularity of Aggregation phase and leverage the regularity of Combination phase. Besides, these engines can exploit various parallelism and reuse highly reusable data efficiently. Third, we optimize the overall system via inter-engine pipeline for inter-phase fusion and priority-based off-chip memory access coordination to improve off-chip bandwidth utilization. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA V100 GPU, our work achieves on average 1509× speedup with 2500× energy reduction and average 6.5× speedup with 10× energy reduction, respectively.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116975333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 203
Communication Lower Bound in Convolution Accelerators 卷积加速器中的通信下界
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-11-08 DOI: 10.1109/HPCA47549.2020.00050
Xiaoming Chen, Yinhe Han, Yu Wang
{"title":"Communication Lower Bound in Convolution Accelerators","authors":"Xiaoming Chen, Yinhe Han, Yu Wang","doi":"10.1109/HPCA47549.2020.00050","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00050","url":null,"abstract":"In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communication for CNN accelerators. For the off-chip communication, we derive the theoretical lower bound for any convolutional layer and propose a dataflow to reach the lower bound. This fundamental problem has never been solved by prior studies. The on-chip communication is minimized based on an elaborate workload and storage mapping scheme. We in addition design a communication-optimal CNN accelerator architecture. Evaluations based on the 65nm technology demonstrate that the proposed architecture nearly reaches the theoretical minimum communication in a three-level memory hierarchy and it is computation dominant. The gap between the energy efficiency of our accelerator and the theoretical best value is only 37-87%.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114927758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信