{"title":"ParTBC: Faster Estimation of Top-k Betweenness Centrality Vertices on GPU","authors":"Somesh Singh, Tejas Shah, R. Nasre","doi":"10.1145/3486613","DOIUrl":"https://doi.org/10.1145/3486613","url":null,"abstract":"Betweenness centrality (BC) is a popular centrality measure, based on shortest paths, used to quantify the importance of vertices in networks. It is used in a wide array of applications including social network analysis, community detection, clustering, biological network analysis, and several others. The state-of-the-art Brandes’ algorithm for computing BC has time complexities of and for unweighted and weighted graphs, respectively. Brandes’ algorithm has been successfully parallelized on multicore and manycore platforms. However, the computation of vertex BC continues to be time-consuming for large real-world graphs. Often, in practical applications, it suffices to identify the most important vertices in a network; that is, those having the highest BC values. Such applications demand only the top vertices in the network as per their BC values but do not demand their actual BC values. In such scenarios, not only is computing the BC of all the vertices unnecessary but also exact BC values need not be computed. In this work, we attempt to marry controlled approximations with parallelization to estimate the k-highest BC vertices faster, without having to compute the exact BC scores of the vertices. We present a host of techniques to determine the top-k vertices faster, with a small inaccuracy, by computing approximate BC scores of the vertices. Aiding our techniques is a novel vertex-renumbering scheme to make the graph layout more structured, which results in faster execution of parallel Brandes’ algorithm on GPU. Our experimental results, on a suite of real-world and synthetic graphs, show that our best performing technique computes the top-k vertices with an average speedup of 2.5× compared to the exact parallel Brandes’ algorithm on GPU, with an error of less than 6%. Our techniques also exhibit high precision and recall, both in excess of 94%.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"6 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81777811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CeMux: Maximizing the Accuracy of Stochastic Mux Adders and an Application to Filter Design","authors":"T. Baker, J. Hayes","doi":"10.1145/3491213","DOIUrl":"https://doi.org/10.1145/3491213","url":null,"abstract":"Stochastic computing (SC) is a low-cost computational paradigm that has promising applications in digital filter design, image processing, and neural networks. Fundamental to these applications is the weighted addition operation, which is most often implemented by a multiplexer (mux) tree. Mux-based adders have very low area but typically require long bitstreams to reach practical accuracy thresholds when the number of summands is large. In this work, we first identify the main contributors to mux adder error. We then demonstrate with analysis and experiment that two new techniques, precise sampling and full correlation, can target and mitigate these error sources. Implementing these techniques in hardware leads to the design of CeMux (Correlation-enhanced Multiplexer), a stochastic mux adder that is significantly more accurate and uses much less area than traditional weighted adders. We compare CeMux to other SC and hybrid designs for an electrocardiogram filtering case study that employs a large digital filter. One major result is that CeMux is shown to be accurate even for large input sizes. CeMux's higher accuracy leads to a latency reduction of 4× to 16× over other designs. Furthermore, CeMux uses about 35% less area than existing designs, and we demonstrate that a small amount of accuracy can be traded for a further 50% reduction in area. Finally, we compare CeMux to a conventional binary design and we show that CeMux can achieve a 50% to 73% area reduction for similar power and latency as the conventional design but at a slightly higher level of error.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"1 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91518558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference","authors":"Chaojian Li, Wuyang Chen, Yuchen Gu, Tianlong Chen, Yonggan Fu, Zhangyang Wang, Yingyan Lin","doi":"10.1145/3510835","DOIUrl":"https://doi.org/10.1145/3510835","url":null,"abstract":"Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms. Current segmentation models are trained and evaluated on massive high-resolution scene images (“data-level”) and suffer from the expensive computation arising from the required multi-scale aggregation (“network level”). In both folds, the computational and energy costs in training and inference are notable due to the often desired large input resolutions and heavy computational burden of segmentation models. To this end, we propose DANCE, general automated DAta-Network Co-optimization for Efficient segmentation model training and inference. Distinct from existing efficient segmentation approaches that focus merely on light-weight network design, DANCE distinguishes itself as an automated simultaneous data-network co-optimization via both input data manipulation and network architecture slimming. Specifically, DANCE integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images’ spatial complexity. Such a downsampling operation, in addition to slimming down the cost associated with the input size directly, also shrinks the dynamic range of input object and context scales, therefore motivating us to also adaptively slim the network to match the downsampled data. Extensive experiments and ablating studies (on four SOTA segmentation models with three popular segmentation datasets under two training settings) demonstrate that DANCE can achieve “all-win” towards efficient segmentation (reduced training cost, less expensive inference, and better mean Intersection-over-Union (mIoU)). Specifically, DANCE can reduce ↓25%–↓77% energy consumption in training, ↓31%–↓56% in inference, while boosting the mIoU by ↓0.71%–↑ 13.34%.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"59 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90253228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pruthvy Yellu, Landon Buell, Miguel Mark, M. Kinsy, Dongpeng Xu, Qiaoyan Yu
{"title":"Security Threat Analyses and Attack Models for Approximate Computing Systems","authors":"Pruthvy Yellu, Landon Buell, Miguel Mark, M. Kinsy, Dongpeng Xu, Qiaoyan Yu","doi":"10.1145/3442380","DOIUrl":"https://doi.org/10.1145/3442380","url":null,"abstract":"Approximate computing (AC) represents a paradigm shift from conventional precise processing to inexact computation but still satisfying the system requirement on accuracy. The rapid progress on the development of diverse AC techniques allows us to apply approximate computing to many computation-intensive applications. However, the utilization of AC techniques could bring in new unique security threats to computing systems. This work does a survey on existing circuit-, architecture-, and compiler-level approximate mechanisms/algorithms, with special emphasis on potential security vulnerabilities. Qualitative and quantitative analyses are performed to assess the impact of the new security threats on AC systems. Moreover, this work proposes four unique visionary attack models, which systematically cover the attacks that build covert channels, compensate approximation errors, terminate normal error resilience mechanisms, and propagate additional errors. To thwart those attacks, this work further offers the guideline of countermeasure designs. Several case studies are provided to illustrate the implementation of the suggested countermeasures.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"1 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89728648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design Automation for Tree-based Nearest Neighborhood–aware Placement of High-speed Cellular Automata on FPGA with Scan Path Insertion","authors":"Ayan Palchaudhuri, Sandeep Sharma, A. Dhar","doi":"10.1145/3446206","DOIUrl":"https://doi.org/10.1145/3446206","url":null,"abstract":"Cellular Automata (CA) is attractive for high-speed VLSI implementation due to modularity, cascadability, and locality of interconnections confined to neighboring logic cells. However, this outcome is not easily transferable to tree-structured CA, since the neighbors having half and double the index value of the current CA cell under question can be sufficiently distanced apart on the FPGA floor. Challenges to meet throughput requirements, seamlessly translate algorithmic modifications for changing application specifications to gate level architectures and to address reliability challenges of semiconductor chips are ever increasing. Thus, a proper design framework assisting automation of synthesizable, delay-optimized VLSI architecture descriptions facilitating testability is desirable. In this article, we have automated the generation of hardware description of tree-structured CA that includes a built-in scan path realized with zero area and delay overhead. The scan path facilitates seeding the CA, state modification, and fault localization on the FPGA fabric. Three placement algorithms were proposed to ensure maximum physical adjacency amongst neighboring CA cells, arranged in a multi-columnar fashion on the FPGA grid. Our proposed architectures outperform implementations arising out of standard placers and behavioral designs, existing tree mapping strategies, and state-of-the-art FPGA centric error detection architectures in area and speed.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"17 1","pages":"1 - 34"},"PeriodicalIF":0.0,"publicationDate":"2021-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81826034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient One-pass Synthesis for Digital Microfluidic Biochips","authors":"N. Mohammadzadeh, R. Wille, Oliver Keszöcze","doi":"10.1145/3446880","DOIUrl":"https://doi.org/10.1145/3446880","url":null,"abstract":"Digital microfluidics biochips are a promising emerging technology that provides fluidic experimental capabilities on a chip (i.e., following the lab-on-a-chip paradigm). However, the design of such biochips still constitutes a challenging task that is usually tackled by multiple individual design steps, such as binding, scheduling, placement, and routing. Performing these steps consecutively may lead to design gaps and infeasible results. To address these shortcomings, the concept of one-pass design for digital microfluidics biochips has recently been proposed—a holistic approach avoiding the design gaps by considering the whole synthesis process as large. But implementations of this concept available thus far suffer from either high computational effort or costly results. In this article, we present an efficient one-pass solution that is runtime efficient (i.e., rarely needing more than a second to successfully synthesize a design) while, at the same time, producing better results than previously published heuristic approaches. Experimental results confirm the benefits of the proposed solution and allow for realizing really large assays composed of thousands of operations in reasonable runtime.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"120 1","pages":"1 - 21"},"PeriodicalIF":0.0,"publicationDate":"2021-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90454117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. S. Rahman, Adib Nahiyan, Fahim Rahman, Saverio Fazzari, Kenneth Plaks, Farimah Farahmandi, Domenic Forte, M. Tehranipoor
{"title":"Security Assessment of Dynamically Obfuscated Scan Chain Against Oracle-guided Attacks","authors":"M. S. Rahman, Adib Nahiyan, Fahim Rahman, Saverio Fazzari, Kenneth Plaks, Farimah Farahmandi, Domenic Forte, M. Tehranipoor","doi":"10.1145/3444960","DOIUrl":"https://doi.org/10.1145/3444960","url":null,"abstract":"Logic locking has emerged as a promising solution to protect integrated circuits against piracy and tampering. However, the security provided by existing logic locking techniques is often thwarted by Boolean satisfiability (SAT)-based oracle-guided attacks. Criteria for successful SAT attacks on locked circuits include: (i) the circuit under attack is fully combinational, or (ii) the attacker has scan chain access. To address the threat posed by SAT-based attacks, we adopt the dynamically obfuscated scan chain (DOSC) architecture and illustrate its resiliency against the SAT attacks when inserted into the scan chain of an obfuscated design. We demonstrate, both mathematically and experimentally, that DOSC exponentially increases the resiliency against key extraction by SAT attack and its variants. Our results show that the mathematical estimation of attack complexity correlates to the experimental results with an accuracy of 95% or better. Along with the formal proof, we model DOSC architecture to its equivalent combinational circuit and perform SAT attack to evaluate its resiliency empirically. Our experiments demonstrate that SAT attack on DOSC-inserted benchmark circuits timeout at minimal test time overhead, and while DOSC requires less than 1% area and power overhead.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"55 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74709234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mitali Sinha, G. Harsha, Pramit Bhattacharyya, Sujay Deb
{"title":"Design Space Optimization of Shared Memory Architecture in Accelerator-rich Systems","authors":"Mitali Sinha, G. Harsha, Pramit Bhattacharyya, Sujay Deb","doi":"10.1145/3446001","DOIUrl":"https://doi.org/10.1145/3446001","url":null,"abstract":"Shared memory architectures, as opposed to private-only memories, provide a viable alternative to meet the ever-increasing memory requirements of multi-accelerator systems to achieve high performance under stringent area and energy constraints. However, an impulsive memory sharing degrades performance due to network contention and latency to access shared memory. We propose the Accelerator Shared Memory (ASM) framework to provide an optimal private/shared memory configuration and shared data allocation under a system’s resource and network constraints. Evaluations show ASM provides up to 34.35% and 31.34% improvement in performance and energy, respectively, over baseline systems.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"106 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2021-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82544050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TAAL","authors":"Ayush Jain, Ziqi Zhou, Ujjwal Guin","doi":"10.1145/3442379","DOIUrl":"https://doi.org/10.1145/3442379","url":null,"abstract":"Due to the globalization of semiconductor manufacturing and test processes, the system-on-a-chip (SoC) designers no longer design the complete SoC and manufacture chips on their own. This outsourcing of the design and manufacturing of Integrated Circuits (ICs) has resulted in several threats, such as overproduction of ICs, sale of out-of-specification/rejected ICs, and piracy of Intellectual Properties (IPs). Logic locking has emerged as a promising defense strategy against these threats. However, various attacks about the extraction of secret keys have undermined the security of logic locking techniques. Over the years, researchers have proposed different techniques to prevent existing attacks. In this article, we propose a novel attack that can break any logic locking techniques that rely on the stored secret key. This proposed TAAL attack is based on implanting a hardware Trojan in the netlist, which leaks the secret key to an adversary once activated. As an untrusted foundry can extract the netlist of a design from the layout/mask information, it is feasible to implement such a hardware Trojan. All three proposed types of TAAL attacks can be used for extracting secret keys. We have introduced the models for both the combinational and sequential hardware Trojans that evade manufacturing tests. An adversary only needs to choose one hardware Trojan out of a large set of all possible Trojans to launch the TAAL attack.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"142 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2021-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77840797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Taming the Overhead Monster for Data-flow Integrity","authors":"Lang Feng, Jiayi Huang, Jeff Huang, Jiang Hu","doi":"10.1145/3490176","DOIUrl":"https://doi.org/10.1145/3490176","url":null,"abstract":"Data-Flow Integrity (DFI) is a well-known approach to effectively detecting a wide range of software attacks. However, its real-world application has been quite limited so far because of the prohibitive performance overhead it incurs. Moreover, the overhead is enormously difficult to overcome without substantially lowering the DFI criterion. In this work, an analysis is performed to understand the main factors contributing to the overhead. Accordingly, a hardware-assisted parallel approach is proposed to tackle the overhead challenge. Simulations on SPEC CPU 2006 benchmark show that the proposed approach can completely enforce the DFI defined in the original seminal work while reducing performance overhead by 4×, on average.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"250 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72952574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}