{"title":"A parallel random walk solver for the capacitance calculation problem in touchscreen design","authors":"Zhezhao Xu, Wenjian Yu, Chao Zhang, Bolong Zhang, Meijuan Lu, M. Mascagni","doi":"10.1145/2902961.2903011","DOIUrl":"https://doi.org/10.1145/2902961.2903011","url":null,"abstract":"In this paper, a random walk based solver is presented which calculates capacitances for verifying a touchscreen design. To suit the complicated conductor geometries in touchscreen structures, we extend the floating random walk (FRW) method for handling non-Manhattan conductors. A unified dielectric pre-characterization scheme is proposed to suit arbitrary dielectric profiles while keeping high accuracy. The algorithm is finally implemented on a computer cluster, which enables massively parallel computing. Numerical experiments validate the accuracy of the proposed techniques and the up to 67X parallel speedup. Compared with other schemes, the unified dielectric pre-characterization scheme exhibits the highest accuracy while costing the least in terms of memory usage.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115271241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and comparative evaluation of a hybrid Cache memory at architectural level","authors":"Wei Wei, K. Namba, F. Lombardi","doi":"10.1145/2902961.2903002","DOIUrl":"https://doi.org/10.1145/2902961.2903002","url":null,"abstract":"A hybrid memory cell usually consists of a Static Random Access Memory (SRAM) and an embedded Dynamic Random Access Memory (eDRAM) cell; hybrid cells are particularly suitable for cache design. A novel hybrid cache memory scheme (that has also non-volatile elements) is initially proposed; this scheme is assessed through extensive simulation to show significant improvements in performance. Different design implementations of the hybrid cache are then proposed at architectural level and different features (such as the memory hit rate, the Instruction Per Cycle (IPC) access pattern and the memory cell access time) are also simulated at this level using benchmarks to show the advantages of the proposed scheme for use as an hybrid cache.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115681702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low energy sketching engines on many-core platform for big data acceleration","authors":"A. Kulkarni, Tahmid Abtahi, E. Smith, T. Mohsenin","doi":"10.1145/2902961.2902984","DOIUrl":"https://doi.org/10.1145/2902961.2902984","url":null,"abstract":"Almost 90% of the data available today was created within the last couple of years, thus Big Data set processing is of utmost importance. Many solutions have been investigated to increase processing speed and memory capacity, however I/O bottleneck is still a critical issue. To tackle this issue we adopt Sketching technique to reduce data communications. Reconstruction of the sketched matrix is performed using Orthogonal Matching Pursuit (OMP). Additionally we propose Gradient Descent OMP (GD-OMP) algorithm to reduce hardware complexity. Big data processing at real-time imposes rigid constraints on sketching kernel, hence to further reduce hardware overhead both algorithms are implemented on a low power domain specific many-core platform called Power Efficient Nano Clusters (PENC). GD-OMP algorithm is evaluated for image reconstruction accuracy and the PENC many-core architecture. Implementation results show that for large matrix sizes GD-OMP algorithm is 1.3× faster and consumes 1.4× less energy than OMP algorithm implementations. Compared to GPU and Quad-Core CPU implementations the PENC many-core reconstructs 5.4× and 9.8× faster respectively for large signal sizes with higher sparsity.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121804779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DCC: Double capacity Cache architecture for narrow-width values","authors":"M. Imani, S. Patil, T. Simunic","doi":"10.1145/2902961.2902990","DOIUrl":"https://doi.org/10.1145/2902961.2902990","url":null,"abstract":"Modern caches are designed to hold 64-bits wide data, however a proportion of data in the caches continues to be narrow width. In this paper, we propose a new cache architecture which increases the effective cache capacity up to 2X for the systems with narrow-width values, while also improving its power efficiency, bandwidth, and reliability. The proposed double capacity cache (DCC) architecture uses a fast and efficient peripheral circuitry to store two narrow-width values in a single wordline. In order to minimize the latency overhead in workloads without narrow-width data, the flag bits are added to tag store. The proposed DCC architecture decreases cache miss-rate by 50%, which results in 27% performance improvement and 30% higher dynamic energy efficiency. To improve reliability, DCC modifies the data distribution on individual bits, which results in 20% and 25% average static-noise margin (SNM) improvement in L1 and L2 caches respectively.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124885688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An enhanced analytical electrical masking model for multiple event transients","authors":"Adam Watkins, S. Tragoudas","doi":"10.1145/2902961.2903007","DOIUrl":"https://doi.org/10.1145/2902961.2903007","url":null,"abstract":"Due to the reducing transistor feature size, the susceptibility of modern circuits to radiation induced errors has increased. This, as a result, has increased the likelihood of multiple transients affecting a circuit. An important aspect when modeling convergent pulses is the approximation of the gate output. Thus, in this paper, a model that approximates the output pulse shape for convergent inputs is proposed. Extensive simulations showed that the proposed model matched closely with HSPICE and provides a speed-up of 15X.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114387049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subrata Das, Soma Das, Adrija Majumder, P. Dasgupta, D. K. Das
{"title":"Delay estimates for graphene nanoribbons: A novel measure of fidelity and experiments with global routing trees","authors":"Subrata Das, Soma Das, Adrija Majumder, P. Dasgupta, D. K. Das","doi":"10.1145/2902961.2903036","DOIUrl":"https://doi.org/10.1145/2902961.2903036","url":null,"abstract":"With extreme miniaturization of traditional CMOS devices in deep sub-micron design levels, the delay of a circuit, as well as power dissipation and area are dominated by interconnections between logic blocks. In an attempt to search for alternative materials, Graphene nanoribbons (GNRs) have been found to be potential for both transistors and interconnects due to its outstanding electrical and thermal properties. GNRs provide better options as materials used for global routing trees in VLSI circuits. However, certain special characteristics of GNRs prohibit direct application of existing VLSI routing tree construction methods for the GNR-based interconnection trees. In this paper, we address this issue possibly for the first time, and propose a heuristic method for construction of GNR-based minimum-delay Steiner trees based on linear-cum-bending hybrid delay model. Experimental results demonstrate the effectiveness of our proposed methods. We propose a novel technique for analyzing the relative accuracy of the delay estimates using rank correlation and statistical significance test. We also compute the delays for the trees generated by hybrid delay heuristic using Elmore delay approximation and use them for determining the relative accuracy of the hybrid delay estimate.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122093983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An offline frequent value encoding for energy-efficient MLC/TLC non-volatile memories","authors":"Ali Alsuwaiyan, K. Mohanram","doi":"10.1145/2902961.2902979","DOIUrl":"https://doi.org/10.1145/2902961.2902979","url":null,"abstract":"This paper describes a low overhead, offline frequent value encoding (FVE) solution to reduce the write energy in multi-level/triple-level cell (MLC/TLC) non-volatile memories (NVMs). The proposed solution, which does not require any runtime software support, clusters a set of general-purpose applications according to their data frequency profiles and generates a dedicated offline FVE that minimizes write energy for each cluster. Results show that the write energy reduction of evaluation sets - using FVEs generated for training sets - are close (equal) to the best known solution for MLC (TLC) NVM encoding; however, our solution incurs a memory overhead that is 16× (5.7×) less than the best comparable scheme in the literature for MLC (TLC) NVMs.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126236134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Area-efficient error-resilient discrete fourier transformation design using stochastic computing","authors":"Bo Yuan, Yanzhi Wang, Zhongfeng Wang","doi":"10.1145/2902961.2902978","DOIUrl":"https://doi.org/10.1145/2902961.2902978","url":null,"abstract":"Discrete Fourier Transformation (DFT)/Fast Fourier Transformation (FFT) are the widely used techniques in numerous modern signal processing applications. In general, because of their inherent multiplication-intensive characteristics, the hardware implementations of DFT/FFT usually require a large amount of hardware resource, which limits their applications in area-constraint scenarios. To overcome this challenge, this paper, for the first time, proposes area-efficient error-resilient DFT designs using stochastic computing. By leveraging low-complexity stochastic multipliers, two types of stochastic DFT design are presented with significant reduction in overall area. Analysis results show that compared with the conventional design, the proposed two 256-point stochastic DFT designs achieve 76% and 62% reduction in area, respectively. More importantly, these stochastic DFT designs also show much stronger error-resilience, which is very attractive in nanoscale CMOS era.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115980281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prolonging lifetime of non-volatile last level caches with cluster mapping","authors":"Morteza Soltani, Mohammad Ebrahimi, Z. Navabi","doi":"10.1145/2902961.2902980","DOIUrl":"https://doi.org/10.1145/2902961.2902980","url":null,"abstract":"Recently, work has been done on using nonvolatile cells, such as Spin Transfer Torque RAM (STT-RAM) or Magnetic RAM (M-RAM), to construct last level caches (LLC). These structures mitigate the leakage power and density problem found in traditional SRAM cells. However, the low endurance of nonvolatile caches decreases the lifetime of the LLC. Therefore, an effective wear-leveling technique is required to tackle this issue. In this paper, we propose the inter-set algorithm that distributes the write traffic to all portions of the cache. Our method is based on cluster mapping that dynamically replaces two clusters during the operation of system. Since the inter-set algorithm is based on data movement, a large amount of data must transfer in each replacement. For an efficient data movement with a minimum effect on performance, we develop the novel scheduling technique that utilizes the idle time of the LLC in the computation phase of the processors. Our approach effectively improves the lifetime of LLC with negligible performance and area overhead. Using these methods in a quad core system with 2MB LLC, we can improve the lifetime of non-volatile LLC by 30% on average.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing fault emulation of transient faults by separating combinational and sequential fault propagation","authors":"R. Nyberg, Johann Heyszl, Dietmar Heinz, G. Sigl","doi":"10.1145/2902961.2903021","DOIUrl":"https://doi.org/10.1145/2902961.2903021","url":null,"abstract":"We present a fault emulation environment capable of injecting single and multiple transient faults in sequential as well as combinational logic. It is used to perform fault injection campaigns during design verification of security circuits such as smart cards. In order to reduce the unacceptable hardware overhead of fault emulation for combinational faults, we split the problem of combinational fault modeling into two steps: 1) Fault injection in combinational cells and propagation into sequential cells, processed by a software approach, and 2) fast FPGA-based fault emulation of faults in sequential logic. We used the presented tool to emulate single and multiple faults in two different designs used for security applications. We analyzed how faults propagate from combinational to sequential logic, discuss the resulting consequences for developers of security circuits and fault analysis environments and derive performance optimizations. We demonstrate the performance of our method with varying tests and varying fault multiplicities. Interestingly, we found that the presented method outperforms conventional standalone FPGA-based approaches, while it requires 45% less logic elements on the FPGA.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131177588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}