{"title":"Cache Partitioning + Loop Tiling: A Methodology for Effective Shared Cache Management","authors":"Vasilios I. Kelefouras, G. Keramidas, N. Voros","doi":"10.1109/ISVLSI.2017.89","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.89","url":null,"abstract":"In this paper, we present a new methodology that provides i) a theoretical analysis of the two most commonly used approaches for effective shared cache management (i.e., cache partitioning and loop tiling) and ii) a unified framework to fine tuning those two mechanisms in tandem (not separately). Our approach manages to lower the number of main memory accesses by one order of magnitude keeping at the same time the number of arithmetical/addressing instructions in a minimal level. We also present a search space exploration analysis where our proposal is able to offer a vast deduction in the required search space.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132527552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tino Flenker, Jan Malburg, G. Fey, Serhiy Avramenko, M. Violante, M. Reorda
{"title":"Towards Making Fault Injection on Abstract Models a More Accurate Tool for Predicting RT-Level Effects","authors":"Tino Flenker, Jan Malburg, G. Fey, Serhiy Avramenko, M. Violante, M. Reorda","doi":"10.1109/ISVLSI.2017.99","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.99","url":null,"abstract":"Fault injection and fault simulation are a typical approach to analyze the effect of a fault on a hardware/software system. Often fault injection is done on abstract models of the system either to retrieve early results when no implementation is available, yet, or to speed-up the runtime intensive fault simulation on detailed models. The simulation results from the abstract model are typically inaccurate because details of the concrete hardware are missing.Here, we propose an approach to relate faults from an abstract untimed algorithmic model to their counterparts in the concrete register transfer models. This allows to understand which faults are covered on the concrete model and to speed up the fault simulation process.We use a mapping between both models' variables and mapped timing states for fault injection to corresponding variables on both models. After fault simulations the results are compared to check, whether a given fault produces the same behavior on both models. The results show that an injected fault to corresponding variables leads to the same behavior of both models for a large share of faults.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130790060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaige Jia, Zheyu Liu, F. Qiao, Xinjun Liu, Qi Wei, Huazhong Yang
{"title":"AICNN: Implementing Typical CNN Algorithms with Analog-to-Information Conversion Architecture","authors":"Kaige Jia, Zheyu Liu, F. Qiao, Xinjun Liu, Qi Wei, Huazhong Yang","doi":"10.1109/ISVLSI.2017.23","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.23","url":null,"abstract":"AICNN architecture is presented in this work to map the state-of-the-art machine-learning algorithms of CNN to power-constrained embedded hardware. As the combination of analog-to-information conversion and typical CNN algorithms, AICNN can realize ultra-highly efficient computation by using massive parallel analog signal processing circuits, which could also significantly reduce ADC devices cost of converting sensors' outputs. As a design example, the specific AICNN-3 implementation is evaluated, which realize the minimum system of typical CNN task using AICNN architecture, with SMIC 0.18 µm CMOS process. Simulation results show that the AICNN-3 can classify a 28x28 MNIST image with only 1.47nJ. Compared with baseline implementation on CPU, the AICNN-3 could achieve 67000x energy-efficiency improvement, however the accuracy loss is less than 1%. Moreover, the influences of devices mismatch and process variations are evaluated using Monte Carlo statistical method, for the imperfection of analog processing paradigm, as well as the scalability of AICNN architecture is also discussed.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130280300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fully Integrated Fast-Response LDO Voltage Regulator with Adaptive Transient Current Distribution","authors":"X. Tong, Kang Wei","doi":"10.1109/ISVLSI.2017.93","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.93","url":null,"abstract":"A fully integrated low-dropout (LDO) regulator with fast transient response is proposed in this paper. The capacitor-less LDO (CL-LDO) regulator incorporates both assisted pass-transistors and control circuit to realize adaptive transient current distribution during the load current transition, thereby enhancing the transient response and minimizing the output voltage's spike. In 65-nm CMOS process, the CL-LDO regulator occupies an active area of 0.0088 mm2. It supplies an output voltage of 1.2 V, while the input supply ranges from 1.5 V to 2.5 V. Subjected to a 100 µA ± 10 mA step change of load current with 1-µs rise time and fall time, the regulator can settle the output to a stable voltage within 1.1 µs and the output voltage's spike is reduced to less than ± 20 mV. The line regulation and load regulation of this regulator are 0.52 mV/V and 0.01 mV/mA, respectively.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117049694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Centrality Indicators for Efficient and Scalable Logic Masking","authors":"Brice Colombier, L. Bossuet, D. Hély","doi":"10.1109/ISVLSI.2017.26","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.26","url":null,"abstract":"Modifying the logic at register transfer level can help to protect a circuit against counterfeiting or illegal copying. By adding extra gates, the outputs can be controllably corrupted. Then the circuit operates correctly only if the right value is applied to the extra gates. The main challenge is to select the best position for these gates, to alter the circuit's behaviour as much as possible. However, another major point is the computational efficiency of the selection process, which should be as good as possible for integration in EDA tools. State-of-the art methods, based on fault analysis, are very demanding and cannot cope with large netlists in a reasonable runtime. We propose to use centrality indicators instead. Centrality is used to identify the most significant vertices of a graph. We show that, when used to select the nodes to modify, they lead to low correlation between original and altered outputs while being computationally efficient. We give experimental results on combinational benchmarks and compare to other previously proposed heuristics. We show that this method is the only efficient selection heuristic which is able to handle large netlists and integrate smoothly into EDA tools.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129740279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High Speed Power Efficient Carry Select Adder Design","authors":"Raghava Katreepalli, T. Haniotakis","doi":"10.1109/ISVLSI.2017.16","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.16","url":null,"abstract":"Adders are basic building blocks of any processor or data path application. For the design of high performance processing units high speed adders with low power consumption is a requirement. Carry Select Adder (CSA) is known to be one of the fastest adders used in many data processing applications. In this paper, we present a new CSA architecture using Manchester carry chain(MCC) in multioutput domino CMOS logic. It employs a novel MCC blocks in an hierarchical approach in the design of the CSA. The proposed design is validated by implementation of 16 and 32-bit adder circuits in a standard 45nm CMOS process technology. This proposed work evaluates the performance of the proposed designs in terms of delay, power consumption and hardware overhead. The results are analyzed and compared with existing fast adder architectures to prove its efficiency. The simulation results shows that the proposed architecture achieves two fold advantages in terms of power-delay product (PDP) and hardware overhead.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128301052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SiLago-CoG: Coarse-Grained Grid-Based Design for Near Tape-Out Power Estimation Accuracy at High Level","authors":"Syed M. A. H. Jafri, Nasim Farahini, A. Hemani","doi":"10.1109/ISVLSI.2017.15","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.15","url":null,"abstract":"It is well known that ASICs have orders of magnitude higher power efficiency than general propose processors. However, due to the high engineering and manufacturing cost only handful of companies can afford to design ASICs. To reduce this cost numerous high-level synthesis tools have emerged since last 2-3 decades. In spite of these tools, ASIC design is still considered expensive because they fail to accurately predict the cost metrics. The inaccuracy is costly as it results in multiple iterations between RTL, logic synthesis, and physical design. The major reason behind this inaccuracy, at high level, is unavailability of information like wiring, orientation, and placement of hardware blocks. To tackle this issue, recent works have proposed to raise the abstraction of the physical design from standard cells to micro-architectural blocks physically organized in a structured grid based layout scheme. While these works have been successful in accurately predicting area and timing, to the best of our knowledge their effectiveness in accurately estimating power is yet to be determined. SiLago-CoG provides an efficient technique to characterize these blocks and estimate power at high level. Simulation and synthesis results reveal that SiLago-CoG provides up to 15X better power estimates in 680X less time at the cost of up to 50% additional area, compared to state-of-the-art.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129982916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Voltage Noise Analysis with Ring Oscillator Clocks","authors":"Lucas Machado, A. Perez, J. Cortadella","doi":"10.1109/ISVLSI.2017.11","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.11","url":null,"abstract":"Voltage noise is the main source of dynamic variability in integrated circuits and a major concern for the design of Power Delivery Networks (PDNs). Ring Oscillators Clocks (ROCs) have been proposed as an alternative to mitigate the negative effects of voltage noise as technology scales down and power density increases. However, their effectiveness highly depends on the design parameters of the PDN, power consumption patterns of the system and spatial locality of the ROCs within the clock domains. This paper analyzes the impact of the PDN parameters and ROC location on the robustness to voltage noise. The capability of reacting instantaneously to unpredictable voltage droops makes ROCs an attractive solution, which allows to reduce the amount of decoupling capacitance without downgrading performance. Tolerance to voltage noise and related benefits can be increased by using multiple ROCs and reducing the size of the clock domains. The analysis shows that up to 83% of the margins for voltage noise and up to 27% of the leakage power can be reduced by using local ROCs.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114772439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chia-Hua Wu, Shi-Yu Huang, Mason Chern, Yung-Fa Chou, D. Kwai
{"title":"Resilient Cell-Based Architecture for Time-to-Digital Converter","authors":"Chia-Hua Wu, Shi-Yu Huang, Mason Chern, Yung-Fa Chou, D. Kwai","doi":"10.1109/ISVLSI.2017.12","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.12","url":null,"abstract":"This paper proposes a resilient Time-to-Digital Converter (TDC) that lends itself to cell-based design automation. We adopt a shrinking-based architecture with a number of distinctive techniques. First of all, a specialized on-chip re-calibration scheme is developed so that the real-time transfer function of the TDC in silicon (which maps an input pulse-width to its corresponding output code) can be derived on the chip and thereby the absolute value (instead of just a relative code) of an input pulse-width under measurement can be reported. Secondly, the sampling errors stemming from the jitters of training clocks used in the calibration scheme are mitigated by the principle of multi sampling. Thirdly, a flexible coarse-shrinking block is adopted and an automatic adjustment scheme is employed so that the coarse-shrinking block can adjust itself when operated under different input pulse-width ranges.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123519877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zipeng Li, Kelvin Yi-Tse Lai, K. Chakrabarty, Tsung-Yi Ho, Chen-Yi Lee
{"title":"Sample Preparation on Micro-Electrode-Dot-Array Digital Microfluidic Biochips","authors":"Zipeng Li, Kelvin Yi-Tse Lai, K. Chakrabarty, Tsung-Yi Ho, Chen-Yi Lee","doi":"10.1109/ISVLSI.2017.34","DOIUrl":"https://doi.org/10.1109/ISVLSI.2017.34","url":null,"abstract":"Sample preparation in digital microfluidics refers to the generation of droplets with target concentrations for onchip biochemical applications. In recent years, digital microfluidic biochips (DMFBs) have been adopted as a platform for sample preparation. However, there remain one major problem associated with sample preparation on a conventional DMFB. For conventional DMFBs, only a (1:1) mixing/splitting model can be used, leading to an increase in the number of fluidic operations required for sample preparation. To overcome the drawback, we adopt a next generation DMFB platform, referred to as micro-electrode-dot-array (MEDA), for sample preparation. We propose the first sample preparation method that exploits the MEDA-specific advantages of fine-grained control of droplet sizes and real-time droplet sensing. Experimental demonstration using a fabricated MEDA biochip and simulation results highlight the effectiveness of the proposed sample-preparation method.","PeriodicalId":187936,"journal":{"name":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}