2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)最新文献_第5页

BBS: Micro-Architecture Benchmarking Blockchain Systems through Machine Learning and Fuzzy Set 论坛:通过机器学习和模糊集对区块链系统进行微架构基准测试

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00041

Liang Zhu, Chao Chen, Zihao Su, Weiguang Chen, Tao Li, Zhibin Yu

{"title":"BBS: Micro-Architecture Benchmarking Blockchain Systems through Machine Learning and Fuzzy Set","authors":"Liang Zhu, Chao Chen, Zihao Su, Weiguang Chen, Tao Li, Zhibin Yu","doi":"10.1109/HPCA47549.2020.00041","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00041","url":null,"abstract":"Due to the decentralization, irreversibility, and traceability, blockchain has attracted significant attention and has been deployed in many critical industries such as banking and logistics. However, the micro-architecture characteristics of blockchain programs still remain unclear. What's worse, the large number of micro-architecture events make understanding the characteristics extremely difficult. We even lack a systematic approach to identify the important events to focus on. In this paper, we propose a novel benchmarking methodology dubbed BBS to characterize blockchain programs at micro-architecture level. The key is to leverage fuzzy set theory to identify important micro-architecture events after the significance of them is quantified by a machine learning based approach. The important events for single programs are employed to characterize the programs while the common important events for multiple programs form an importance vector which is used to measure the similarity between benchmarks. We leverage BBS to characterize seven and six benchmarks from Blockbench and Caliper, respectively. The results show that BBS can reveal interesting findings. Moreover, by leveraging the importance characterization results, we improve that the transaction throughput of Smallbank from Fabric by 70% while reduce the transaction latency by 55%. In addition, we find that three of seven and two of six benchmarks from Blockbench and Caliper are redundant, respectively.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129628799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

EMSim: A Microarchitecture-Level Simulation Tool for Modeling Electromagnetic Side-Channel Signals EMSim:一种微架构级仿真工具，用于电磁侧信道信号的建模

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00016

Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović

{"title":"EMSim: A Microarchitecture-Level Simulation Tool for Modeling Electromagnetic Side-Channel Signals","authors":"Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović","doi":"10.1109/HPCA47549.2020.00016","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00016","url":null,"abstract":"Side-channel attacks have become a serious security concern for computing systems, especially for embedded devices, where the device is often located in, or in proximity to, a public place, and yet the system contains sensitive information. To design systems that are highly resilient to such attacks, an accurate and efficient design-stage quantitative analysis of side-channel leakage is needed. For many systems properties (e.g., performance, power, etc.), cycle-accurate simulation can provide such an efficient-yet-accurate design-stage estimate. Unfortunately, for an important class of side-channels, electromagnetic emanations, such a model does not exist, and there has not even been much quantitative evidence about what level of modeling detail (e.g., hardware, microarchitecture, etc.) would be needed for high accuracy. This paper presents EMSim, an approach that enables simulation of the electromagnetic (EM) side-channel signals cycle-by-cycle using a detailed micro-architectural model of the device. To evaluate EMSim, we compare its signals against actual EM signals emanated from real hardware (FPGA-based RISC-V processor), and find that they match very closely. To gain further insights, we also experimentally identify how the accuracy of the simulation degrades when key microarchitectural features (e.g., pipeline stall, cache-miss, etc.) and other hardware behaviors (e.g., data-dependent switching activity) are omitted from the simulation model. We further evaluate how robust the simulation-based results are, by comparing them to real signals collected in different conditions (manufacturing, distance, etc.). Finally, to show the applicability of EMSim, we demonstrate how it can be used to measure side-channel leakage through simulation at design-stage.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129753659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators 支点:灵活实用的原位加速器的简化控制和存取机制

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00052

Marzieh Lenjani, Patricia Gonzalez-Guerrero, Elaheh Sadredini, Shuangchen Li, Yuan Xie, Ameen Akel, S. Eilert, M. Stan, K. Skadron

{"title":"Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators","authors":"Marzieh Lenjani, Patricia Gonzalez-Guerrero, Elaheh Sadredini, Shuangchen Li, Yuan Xie, Ameen Akel, S. Eilert, M. Stan, K. Skadron","doi":"10.1109/HPCA47549.2020.00052","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00052","url":null,"abstract":"In-situ approaches process data very close to the memory cells, in the row buffer of each subarray. This minimizes data movement costs and affords parallelism across subarrays. However, current in-situ approaches are limited to only row-wide bitwise (or few-bit) operations applied uniformly across the row buffer. They impose a significant overhead of multiple row activations for emulating 32-bit addition and multiplications using bitwise operations and cannot support operations with data dependencies or based on predicates. Moreover, with current peripheral logic, communication among subarrays is inefficient, and with typical data layouts, bits in a word are not physically adjacent. The key insight of this work is that in-situ, single-word ALUs outperform in-situ, parallel, row-wide, bitwise ALUs by reducing the number of row activations and enabling new operations and optimizations. Our proposed lightweight access and control mechanism, Fulcrum, sequentially feeds data into the single-word ALU and enables operations with data dependencies and operations based on a predicate. For algorithms that require communication among subarrays, we augment the peripheral logic with broadcasting capabilities and a previously-proposed method for low-cost inter-subarray data movement. The sequential processor also enables overlapping of broadcasting and computation, and reuniting bits that are physically adjacent. In order to realize true subarray-level parallelism, we introduce a lightweight column-selection mechanism through shifting one-hot encoded values. This technique enables independent column selection in each subarray. We integrate Fulcrum with Compress Express Link (CXL), a new interconnect standard. Fulcrum with one memory stack delivers on average (up to) 23.4 (76) speedup over a server-class GPU, NVIDIA P100, with three stacks of HBM2 memory, (ii) 70 (228) times speedup per memory stack over the GPU, and (iii) 19 (178.9) times speedup per memory stack over an ideal model of the GPU, which only accounts for the overhead of data movement.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134200248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

A New Side-Channel Vulnerability on Modern Computers by Exploiting Electromagnetic Emanations from the Power Management Unit 利用电源管理单元电磁辐射的现代计算机新侧信道漏洞

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00020

Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović

{"title":"A New Side-Channel Vulnerability on Modern Computers by Exploiting Electromagnetic Emanations from the Power Management Unit","authors":"Nader Sehatbakhsh, B. Yilmaz, A. Zajić, Milos Prvulović","doi":"10.1109/HPCA47549.2020.00020","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00020","url":null,"abstract":"This paper presents a new micro-architectural vulnerability on the power management units of modern computers which creates an electromagnetic-based side-channel. The key observations that enable us to discover this sidechannel are: 1) in an effort to manage and minimize power consumption, modern microprocessors have a number of possible operating modes (power states) in which various sub-systems of the processor are powered down, 2) for some of the transitions between power states, the processor also changes the operating mode of the voltage regulator module (VRM) that supplies power to the affected sub-system, and 3) the electromagnetic (EM) emanations from the VRM are heavily dependent on its operating mode. As a result, these state-dependent EM emanations create a side-channel which can potentially reveal sensitive information about the current state of the processor and, more importantly, the programs currently being executed. To demonstrate the feasibility of exploiting this vulnerability, we create a covert channel by utilizing the changes in the processor's power states. We show how such a covert channel can be leveraged to exfiltrate sensitive information from a secured and completely isolated (air-gapped) laptop system by placing a compact, inexpensive receiver in proximity to that system. To further show the severity of this attack, we also demonstrate how such a covert channel can be established when the target and the receiver are several meters away from each other, including scenarios where the receiver and the target are separated by a wall. Compared to the state-of-the-art, the proposed covert channel has >3x higher bit-rate. Finally, to demonstrate that this new vulnerability is not limited to being used as a covert channel, we demonstrate how it can be used for attacks such as keystroke logging.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133301750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

DRAM-Less: Hardware Acceleration of Data Processing with New Memory 无内存:新内存数据处理的硬件加速

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00032

Jie Zhang, Gyuyoung Park, D. Donofrio, J. Shalf, Myoungsoo Jung

引用次数: 6

Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices 降低移动设备连接待机能耗的技术

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00057

Jawad Haj-Yahya, Yiannakis Sazeides, M. Alser, Efraim Rotem, O. Mutlu

{"title":"Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices","authors":"Jawad Haj-Yahya, Yiannakis Sazeides, M. Alser, Efraim Rotem, O. Mutlu","doi":"10.1109/HPCA47549.2020.00057","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00057","url":null,"abstract":"Modern mobile devices, such as smartphones, tablets, and laptops, are idle most of the time but they remain connected to communication channels even when idle. This operation mode is called connected-standby. To increase battery life in the connected-standby mode, a mobile device enters the deepest-runtime-idle-power state (DRIPS), which minimizes power consumption and retains fast wake-up capability. In this work, we identify three sources of energy inefficiency in modern DRIPS designs and introduce three techniques to reduce the power consumption of mobile devices in connected-standby. To our knowledge, this is the first work to explicitly focus on and improve the connected-standby power management of high-performance mobile devices, with evaluations on a real system. We propose the optimized-deepest-runtime-idle-power state (ODRIPS), a mechanism that dynamically: 1) offloads the monitoring of wake-up events to low-power off-chip circuitry, which enables turning off all of the processor's clock sources, 2) offloads all of the processor's input/output functionality off-chip and power-gates the corresponding on-chip input/output functions, and 3) transfers the processor's context to a secure memory region inside DRAM, which eliminates the need to store the context using on-chip high-leakage SRAMs, thereby reducing leakage power. We implement ODRIPS in Intel's Skylake client processor and its associated Sunrise-Point chipset. An analysis of ODRIPS on a real system reveals that it reduces the platform average power consumption in connected-standby mode by 22%. We also identify an opportunity to further reduce ODRIPS power by using emerging low-power non-volatile memory (instead of DRAM) to store the processor context.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128540727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Experiences with ML-Driven Design: A NoC Case Study 机器学习驱动设计的经验:一个NoC案例研究

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI: 10.1109/HPCA47549.2020.00058

Jieming Yin, Subhash Sethumurugan, Yasuko Eckert, Chintan Patel, Alan Smith, Eric Morton, M. Oskin, Natalie D. Enright Jerger, G. Loh

引用次数: 13

Domain-Specialized Cache Management for Graph Analytics 图形分析领域专用缓存管理

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-01-22 DOI: 10.1109/HPCA47549.2020.00028

P. Faldu, Jeff Diamond, Boris Grot

{"title":"Domain-Specialized Cache Management for Graph Analytics","authors":"P. Faldu, Jeff Diamond, Boris Grot","doi":"10.1109/HPCA47549.2020.00028","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00028","url":null,"abstract":"Graph analytics power a range of applications in areas as diverse as finance, networking and business logistics. A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections in the graph. These richly-connected, hot, vertices inherently exhibit high reuse. However, this work finds that state-of-the-art hardware cache management schemes struggle in capitalizing on their reuse due to highly irregular access patterns of graph analytics. In response, we propose GRASP, domain-specialized cache management at the last-level cache for graph analytics. GRASP augments existing cache policies to maximize reuse of hot vertices by protecting them against cache thrashing, while maintaining sufficient flexibility to capture the reuse of other vertices as needed. GRASP keeps hardware cost negligible by leveraging lightweight software support to pinpoint hot vertices, thus eliding the need for storage-intensive prediction mechanisms employed by state-of-the-art cache management schemes. On a set of diverse graph-analytic applications with large high-skew graph datasets, GRASP outperforms prior domain-agnostic schemes on all datapoints, yielding an average speed-up of 4.2% (max 9.4%) over the best-performing prior scheme. GRASP remains robust on low-/no-skew datasets, whereas prior schemes consistently cause a slowdown.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123381180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

HyGCN: A GCN Accelerator with Hybrid Architecture HyGCN:混合架构的GCN加速器

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-01-07 DOI: 10.1109/HPCA47549.2020.00012

Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, Yuan Xie

{"title":"HyGCN: A GCN Accelerator with Hybrid Architecture","authors":"Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, Yuan Xie","doi":"10.1109/HPCA47549.2020.00012","DOIUrl":"https://doi.org/10.1109/HPCA47549.2020.00012","url":null,"abstract":"Inspired by the great success of neural networks, graph convolutional neural networks (GCNs) are proposed to analyze graph data. GCNs mainly include two phases with distinct execution patterns. The Aggregation phase, behaves as graph processing, showing a dynamic and irregular execution pattern. The Combination phase, acts more like the neural networks, presenting a static and regular execution pattern. The hybrid execution patterns of GCNs require a design that alleviates irregularity and exploits regularity. Moreover, to achieve higher performance and energy efficiency, the design needs to leverage the high intra-vertex parallelism in Aggregation phase, the highly reusable inter-vertex data in Combination phase, and the opportunity to fuse phase-by-phase execution introduced by the new features of GCNs. However, existing architectures fail to address these demands. In this work, we first characterize the hybrid execution patterns of GCNs on Intel Xeon CPU. Guided by the characterization, we design a GCN accelerator, HyGCN, using a hybrid architecture to efficiently perform GCNs. Specifically, first, we build a new programming model to exploit the fine-grained parallelism for our hardware design. Second, we propose a hardware design with two efficient processing engines to alleviate the irregularity of Aggregation phase and leverage the regularity of Combination phase. Besides, these engines can exploit various parallelism and reuse highly reusable data efficiently. Third, we optimize the overall system via inter-engine pipeline for inter-phase fusion and priority-based off-chip memory access coordination to improve off-chip bandwidth utilization. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA V100 GPU, our work achieves on average 1509× speedup with 2500× energy reduction and average 6.5× speedup with 10× energy reduction, respectively.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116975333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 203

Communication Lower Bound in Convolution Accelerators 卷积加速器中的通信下界

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-11-08 DOI: 10.1109/HPCA47549.2020.00050

Xiaoming Chen, Yinhe Han, Yu Wang

引用次数: 21