Gokul Krishnan, A. Goksoy, Sumit K. Mandal, Zhenyu Wang, C. Chakrabarti, Jae-sun Seo, U. Ogras, Yu Cao
{"title":"Big-Little Chiplets for In-Memory Acceleration of DNNs: A Scalable Heterogeneous Architecture","authors":"Gokul Krishnan, A. Goksoy, Sumit K. Mandal, Zhenyu Wang, C. Chakrabarti, Jae-sun Seo, U. Ogras, Yu Cao","doi":"10.1145/3508352.3549447","DOIUrl":"https://doi.org/10.1145/3508352.3549447","url":null,"abstract":"Monolithic in-memory computing (IMC) architectures face significant yield and fabrication cost challenges as the complexity of DNNs increases. Chiplet-based IMCs that integrate multiple dies with advanced 2.5D/3D packaging offers a low-cost and scalable solution. They enable heterogeneous architectures where the chiplets and their associated interconnection can be tailored to the non-uniform algorithmic structures to maximize IMC utilization and reduce energy consumption. This paper proposes a heterogeneous IMC architecture with big-little chiplets and a hybrid network-on-package (NoP) to optimize the utilization, interconnect bandwidth, and energy efficiency. For a given DNN, we develop a custom methodology to map the model onto the big-little architecture such that the early layers in the DNN are mapped to the little chiplets with higher NoP bandwidth and the subsequent layers are mapped to the big chiplets with lower NoP bandwidth. Furthermore, we achieve a scalable solution by incorporating a DRAM into each chiplet to support a wide range of DNNs beyond the area limit. Compared to a homogeneous chiplet-based IMC architecture, the proposed big-little architecture achieves up to 329× improvement in the energy-delay-area product (EDAP) and up to 2× higher IMC utilization. Experimental evaluation of the proposed big-little chiplet-based RRAM IMC architecture for ResNet-50 on ImageNet shows 259×, 139×, and 48× improvement in energy-efficiency at lower area compared to Nvidia V100 GPU, Nvidia T4 GPU, and SIMBA architecture, respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115618410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast, Robust and Accurate Detection of Cache-based Spectre Attack Phases","authors":"A. Pashrashid, Ali Hajiabadi, Trevor E. Carlson","doi":"10.1145/3508352.3549330","DOIUrl":"https://doi.org/10.1145/3508352.3549330","url":null,"abstract":"Modern processors achieve high performance and efficiency by employing techniques such as speculative execution and sharing resources such as caches. However, recent attacks like Spectre and Meltdown exploit the speculative execution of modern processors to leak sensitive information from the system. Many mitigation strategies have been proposed to restrict the speculative execution of processors and protect potential side-channels. Currently, these techniques have shown a significant performance overhead. A solution that can detect memory leaks before the attacker has a chance to exploit them would allow the processor to reduce the performance overhead by enabling protections only when the system is at risk.In this paper, we propose a mechanism to detect speculative execution attacks that use caches as a side-channel. In this detector we track the phases of a successful attack and raise an alert before the attacker gets a chance to recover sensitive information. We accomplish this through monitoring the microarchitectural changes in the core and caches, and detect the memory locations that can be potential memory data leaks. We achieve 100% accuracy and negligible false positive rate in detecting Spectre attacks and evasive versions of Spectre that the state-of-the-art detectors are unable to detect. Our detector has no performance overhead with negligible power and area overheads.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121031663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning for Testing Machine-Learning Hardware: A Virtuous Cycle∗","authors":"Arjun Chaudhuri, Jonti Talukdar, K. Chakrabarty","doi":"10.1145/3508352.3561121","DOIUrl":"https://doi.org/10.1145/3508352.3561121","url":null,"abstract":"The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines the problem of classifying structural faults in the processing elements (PEs) of systolic-array accelerators. We first present a two-tier machine-learning (ML) based method to assess the functional criticality of faults. While supervised learning techniques can be used to accurately estimate fault criticality, it requires a considerable amount of ground truth for model training. We therefore describe a neural-twin framework for analyzing fault criticality with a negligible amount of ground-truth data. We further describe a topological and probabilistic framework to estimate the expected number of PE’s primary outputs (POs) flipping in the presence of defects and use the PO-flip count as a surrogate for determining fault criticality. We demonstrate that the combination of PO-flip count and neural twin-enabled sensitivity analysis of internal nets can be used as additional features in existing ML-based criticality classifiers.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121349038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lilas Alrahis, Satwik Patnaik, M. Shafique, O. Sinanoglu
{"title":"Embracing Graph Neural Networks for Hardware Security","authors":"Lilas Alrahis, Satwik Patnaik, M. Shafique, O. Sinanoglu","doi":"10.1145/3508352.3561096","DOIUrl":"https://doi.org/10.1145/3508352.3561096","url":null,"abstract":"Graph neural networks (GNNs) have attracted increasing attention due to their superior performance in deep learning on graph-structured data. GNNs have succeeded across various domains such as social networks, chemistry, and electronic design automation (EDA). Electronic circuits have a long history of being represented as graphs, and to no surprise, GNNs have demonstrated state-of-the-art performance in solving various EDA tasks. More importantly, GNNs are now employed to address several hardware security problems, such as detecting intellectual property (IP) piracy and hardware Trojans (HTs), to name a few.In this survey, we first provide a comprehensive overview of the usage of GNNs in hardware security and propose the first taxonomy to divide the state-of-the-art GNN-based hardware security systems into four categories: (i) HT detection systems, (ii) IP piracy detection systems, (iii) reverse engineering platforms, and (iv) attacks on logic locking. We summarize the different architectures, graph types, node features, benchmark data sets, and model evaluation of the employed GNNs. Finally, we elaborate on the lessons learned and discuss future directions.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125570212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olympia Axelou, N. Evmorfopoulos, G. Floros, G. Stamoulis, S. Sapatnekar
{"title":"A Novel Semi-Analytical Approach for Fast Electromigration Stress Analysis in Multi-Segment Interconnects","authors":"Olympia Axelou, N. Evmorfopoulos, G. Floros, G. Stamoulis, S. Sapatnekar","doi":"10.1145/3508352.3549476","DOIUrl":"https://doi.org/10.1145/3508352.3549476","url":null,"abstract":"As integrated circuit technologies move below 10 nm, Electromigration (EM) has become an issue of great concern for the longterm reliability due to the stricter performance, thermal and power requirements. The problem of EM becomes even more pronounced in power grids due to the large unidirectional currents flowing in these structures. The attention for EM analysis during the past years has been drawn to accurate physics-based models describing the interplay between the electron wind force and the back stress force, in a single Partial Differential Equation (PDE) involving wire stress. In this paper, we present a fast semi-analytical approach for the solution of the stress PDE at discrete spatial points in multi-segment lines of power grids, which allows the analytical calculation of EM stress independently at any time in these lines. Our method exploits the specific form of the discrete stress coefficient matrix whose eigenvalues and eigenvectors are known beforehand. Thus, a closed-form equation can be constructed with almost linear time complexity without the need of time discretization. This closed-form equation can be subsequently used at any given time in transient stress analysis. Our experimental results, using the industrial IBM power grid benchmarks, demonstrate that our method has excellent accuracy compared to the industrial tool COMSOL while being orders of magnitude times faster.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122088583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinwook Jung, A. Kahng, R. Varadarajan, Zhiang Wang
{"title":"IEEE CEDA DATC: Expanding Research Foundations for IC Physical Design and ML-Enabled EDA","authors":"Jinwook Jung, A. Kahng, R. Varadarajan, Zhiang Wang","doi":"10.1145/3508352.3561379","DOIUrl":"https://doi.org/10.1145/3508352.3561379","url":null,"abstract":"This paper describes new elements in the RDF-2022 release of the DATC Robust Design Flow, along with other activities of the IEEE CEDA DATC. The RosettaStone initiated with RDF-2021 has been augmented to include 35 benchmarks and four open-source technologies (ASAP7, NanGate45 and SkyWater130HS/HD), plus timing-sensible versions created using path-cutting. The Hier-RTLMP macro placer is now part of DATC RDF, enabling macro placement for large modern designs with hundreds of macros. To establish a clear baseline for macro placers, new open-source benchmark suites on open PDKs, with corresponding flows for fully reproducible results, are provided. METRICS2.1 infrastructure in OpenROAD and OpenROAD-flow-scripts now uses native JSON metrics reporting, which is more robust and general than the previous Python script-based method. Calibrations on open enablements have also seen notable updates in the RDF. Finally, we also describe an approach to establishing a generic, cloud-native large-scale design of experiments for ML-enabled EDA. Our paper closes with future research directions related to DATC’s efforts.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128539418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Advancing Physical Design using Graph Neural Networks (Invited Paper)","authors":"Yi-Chen Lu, S. Lim","doi":"10.1145/3508352.3561094","DOIUrl":"https://doi.org/10.1145/3508352.3561094","url":null,"abstract":"As modern Physical Design (PD) algorithms and methodologies evolve into the post-Moore era with the aid of machine learning, Graph Neural Networks (GNNs) are becoming increasingly ubiquitous given that netlists are essentially graphs. Recently, their ability to perform effective graph learning has provided significant insights to understand the underlying dynamics during netlist-to-layout transformations. GNNs follow a message-passing scheme, where the goal is to construct meaningful representations either at the entire graph or node-level by recursively aggregating and transforming the initial features. In the realm of PD, the GNN-learned representations have been leveraged to solve the tasks such as cell clustering, quality-of-result prediction, activity simulation, etc., which often overcome the limitations of traditional PD algorithms. In this work, we first revisit recent advancements that GNNs have made in PD. Second, we discuss how GNNs serve as the backbone of novel PD flows. Finally, we present our thoughts on ongoing and future PD challenges that GNNs can tackle and succeed.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126880778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized Heterogeneity-aware Federated Search Towards Better Accuracy and Energy Efficiency","authors":"Zhao Yang, Qingshuang Sun","doi":"10.1145/3508352.3549403","DOIUrl":"https://doi.org/10.1145/3508352.3549403","url":null,"abstract":"Federated learning (FL), a new distributed technology, allows us to train the global model on the edge and embedded devices without local data sharing. However, due to the wide distribution of different types of devices, FL faces severe heterogeneity issues. The accuracy and efficiency of FL deployment at the edge are severely impacted by heterogeneous data and heterogeneous systems. In this paper, we perform joint FL model personalization for heterogeneous systems and heterogeneous data to address the challenges posed by heterogeneities. We begin by using model inference efficiency as a starting point to personalize network scale on each node. Furthermore, it can be used to guide the efficient FL training process, which can help to ease the problem of straggler devices and improve FL’s energy efficiency. During FL training, federated search is then used to acquire highly accurate personalized network structures. By taking into account the unique characteristics of FL deployment at edge devices, the personalized network structures obtained by our federated search framework with a lightweight search controller can achieve competitive accuracy with state-of-the-art (SOTA) methods, while reducing inference and training energy consumption by up to 3.57× and 1.82×, respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123650829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maico Cassel dos Santos, Tianyu Jia, M. Cochet, Karthik Swaminathan, Joseph Zuckerman, Paolo Mantovani, Davide Giri, J. Zhang, Erik Jens Loscalzo, Gabriele Tombesi, Kevin Tien, Nandhini Chandramoorthy, J. Wellman, David Brooks, Gu-Yeon Wei, K. Shepard, L. Carloni, P. Bose
{"title":"A Scalable Methodology for Agile Chip Development with Open-Source Hardware Components : (Invited Paper)","authors":"Maico Cassel dos Santos, Tianyu Jia, M. Cochet, Karthik Swaminathan, Joseph Zuckerman, Paolo Mantovani, Davide Giri, J. Zhang, Erik Jens Loscalzo, Gabriele Tombesi, Kevin Tien, Nandhini Chandramoorthy, J. Wellman, David Brooks, Gu-Yeon Wei, K. Shepard, L. Carloni, P. Bose","doi":"10.1145/3508352.3561102","DOIUrl":"https://doi.org/10.1145/3508352.3561102","url":null,"abstract":"We present a scalable methodology for the agile physical design of tile-based heterogeneous system-on-chip (SoC) architectures that simplifies the reuse and integration of open-source hardware components. The methodology leverages the regularity of the on-chip communication infrastructure, which is based on a multi-plane network-on-chip (NoC), and the modularity of socket interfaces, which connect the tiles to the NoC. Each socket also provides its tile with a set of platform services, including independent clocking and voltage control. As a result, the physical design of each tile can be decoupled from its location in the top-level floorplan of the SoC and the overall SoC design can benefit from a hierarchical timing-closure flow, design reuse and, if necessary, fast respin. With the proposed methodology we completed two SoC tapeouts of increasing complexity, which illustrate its capabilities and the resulting gains in terms of design productivity.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121797923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HECTOR: A Multi-level Intermediate Representation for Hardware Synthesis Methodologies","authors":"Ruifan Xu, You-lin Xiao, Jin Luo, Yun Liang","doi":"10.1145/3508352.3549370","DOIUrl":"https://doi.org/10.1145/3508352.3549370","url":null,"abstract":"Hardware synthesis requires a complicated process to generate synthesizable register transfer level (RTL) code. High-level synthesis tools can automatically transform a high-level description into hardware design, while hardware generators adopt domain specific languages and synthesis flows for specific applications. The implementation of these tools generally requires substantial engineering efforts due to RTL’s weak expressivity and low level of abstraction. Furthermore, different synthesis tools adopt different levels of intermediate representations (IR) and transformations. A unified IR obviously is a good way to lower the engineering cost and get competitive hardware design rapidly by exploring different synthesis methodologies.In this paper, we propose Hector, a two-level IR providing a unified intermediate representation for hardware synthesis methodologies. The high-level IR binds computation with a control graph annotated with timing information, while the low-level IR provides a concise way to describe hardware modules and elastic interconnections among them. Implemented based on the multi-level compiler infrastructure (MLIR), Hector’s IRs can be converted to synthesizable RTL designs. To demonstrate the expressivity and versatility, we implement three synthesis approaches based on Hector: a high-level synthesis (HLS) tool, a systolic array generator, and a hardware accelerator. The hardware generated by Hector’s HLS approach is comparable to that generated by the state-of-the-art HLS tools, and the other two cases outperform HLS implementations in performance and productivity.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130238568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}