A. Khonsari, Mohammadreza Aghajani, Arash Tavakkol, M. S. Talebi
{"title":"Mathematical analysis of buffer sizing for Network-on-Chips under multimedia traffic","authors":"A. Khonsari, Mohammadreza Aghajani, Arash Tavakkol, M. S. Talebi","doi":"10.1109/ICCD.2008.4751854","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751854","url":null,"abstract":"Designing appropriate buffer sizes for routers within Network-on-Chip (NoC) so as to minimize the power while preserving the required performance in the presence of self-similar traffic has been considered a challenging problem in the literature. A few analytical studies carried out in NoC modeling have been adopted assumptions such as exponentially-distributed packet inter-arrivals, and conclusions reached under such assumptions may be inappropriate in the presence of self-similar traffic. Through mathematical analysis this paper predicts the optimal buffer size under self-similar traffic using Discrete Poisson Pareto Burst Process (DPPBP). The validity of the mathematical expressions is demonstrated through simulation experiments.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114248451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Test cost minimization through adaptive test development","authors":"Mingjing Chen, A. Orailoglu","doi":"10.1109/ICCD.2008.4751867","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751867","url":null,"abstract":"The ever-increasing complexity of mixed-signal circuits imposes an increasingly complicated and comprehensive parametric test requirement, resulting in a highly lengthened manufacturing test phase. Attaining parametric test cost reduction with no test quality degradation constitutes a critical challenge during test development. The capability of parametric test data to capture systematic process variations engenders a highly accurate prediction of the efficiency of each test for a particular lot of chips even on the basis of a small quantity of characterized data. The predicted test efficiency further enables the adjustment of the test set and test order, leading to an early detection of faults. We explore such an adaptive strategy, by introducing a technique that prunes the test set based on a test correlation analysis. A test selection algorithm is proposed to identify the minimum set of tests that delivers a satisfactory defect coverage. A probabilistic measure that reflects the defect detection efficiency is used to order the test set so as to enhance the probability of an early detection of faulty chips. The test sequence is further optimized during the testing process by dynamically adjusting the initial test order to adapt to the local defect pattern fluctuations in the lot of chips under test. Experimental results show that the proposed technique delivers significant test time reductions while attaining higher test quality compared to previous adaptive test methodologies.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127890990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SynECO: Incremental technology mapping with constrained placement and fast detail routing for predictable timing improvement","authors":"Anuj Kumar, Tai-Hsuan Wu, A. Davoodi","doi":"10.1109/ICCD.2008.4751915","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751915","url":null,"abstract":"We present SynECO, a framework to achieve predictable timing improvement via incremental resynthesis and replacement. We target timing-critical paths postplacement and resynthesize and replace promising gates. We show since the wire delays are the non-negligible contributors to a critical-path delay, it is crucial to accurately estimate them to make a predictable synthesis modification. For this purpose, we incorporate an accurate timing analysis tool which uses fast detail routing for wire delay estimation. This allows generating timing estimates that correlate much better with post-routing values compared to Steiner-tree-based estimate of wiring tree and using D2M delay model. Detail routing information allows incorporation of factors such as crosstalk, metal layer assignment and via delays which are crucial for accurate analysis. For fast synthesis, we constrain our logical modifications to be from the physical neighborhood of target gates on the critical paths. Our synthesis framework is completely integrated with the Cadence Encounter tools for physical design.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"134 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130989020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-cost open-page prefetch scheduling in chip multiprocessors","authors":"Marius Grannæs, Magnus Jahre, L. Natvig","doi":"10.1109/ICCD.2008.4751890","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751890","url":null,"abstract":"The pressure on off-chip memory increases significantly as more cores compete for the same resources. A CMP deals with the memory wall by exploiting thread level parallelism (TLP), shifting the focus from reducing overall memory latency to memory throughput. This extends to the memory controller where the 3D structure of modern DRAM is exploited to increase throughput. Traditionally, prefetching reduces latency by fetching data before it is needed. In this paper we explore how prefetching can be used to increase memory throughput. We present our own low-cost open-page prefetch scheduler that exploits the 3D structure of DRAM when issuing prefetches. We show that because of the complex structure of modern DRAM, prefetches can be made cheaper than ordinary reads, thus making prefetching beneficial even when prefetcher accuracy is low. As a result, prefetching with good coverage is more important than high accuracy. By exploiting this observation our low-cost open page scheme increases performance and QoS. Furthermore, we explore how prefetches should be scheduled in a state of the art memory controller by examining sequential, scheduled region, CZone/delta correlation and reference prediction table prefetchers.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115831101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Rafiev, Julian P. Murphy, D. Sokolov, A. Yakovlev
{"title":"Conversion driven design of binary to mixed radix circuits","authors":"A. Rafiev, Julian P. Murphy, D. Sokolov, A. Yakovlev","doi":"10.1109/ICCD.2008.4751893","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751893","url":null,"abstract":"A conversion driven design approach is described. It takes the outputs of mature and time-proven EDA synthesis tools to generate mixed radix datapath circuits in an endeavour to investigate the added relative advantages or disadvantages. An algorithm underpinning the approach is presented and formally described together with m-of-n encoded gate-level implementations. The application is found in a wide variety and overlapping areas of circuit design, here a subset are analysed where the method finds the strongest application: arithmetic circuits and hardware security. The obtained results are reported showing an increase in power consumption but with considerable improvement in resistance to differential power analysis (DPA).","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130858688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamically reconfigurable soft output MIMO detector","authors":"Pankaj Bhagawat, Rajballav Dash, G. Choi","doi":"10.1109/ICCD.2008.4751842","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751842","url":null,"abstract":"MIMO systems (with multiple transmit and receive antennas) are becoming increasingly popular, and many next-generation systems such as WiMAX, 3-GPP LTE and IEEE802.11 n wireless LANs rely on the increased throughput of MIMO systems with up to four antennas at receiver and transmitter. High throughput implementation of the detection unit for MIMO systems is a significant challenge. This challenge becomes still harder, because the above mentioned standards demand support for multiple modulation and coding schemes. This implies that the MIMO detector must be dynamically reconfigurable. Also, to achieve required bit error rate (BER) or frame error rate (FER) performance, the detector has to provide soft values to advanced forward error correction (FEC) schemes like turbo Codes. This paper presents an ASIC implementation of a novel MIMO detector architecture that is able to reconfigure on the fly and provides soft values as output. The design is implemented in 45 nm predictive technology library, and has a parallelism factor of four. The detector has many qualities of a systolic architecture and achieves a continuous throughput of 1 Gbps for QPSK, 500 Mbps for 16-QAM, and 187.5 Mbps for 64-QAM. The total area is estimated to be approximately 70 KGates equivalent, and power consumption is estimated to be 114 mW.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130868165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulation points for SPEC CPU 2006","authors":"Arun A. Nair, L. John","doi":"10.1109/ICCD.2008.4751891","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751891","url":null,"abstract":"Increasing sizes of benchmarks make detailed simulation an extremely time consuming process. Statistical techniques such as the SimPoint methodology have been proposed in order to address this problem during the initial design phase. The SimPoint methodology attempts to identify repetitive, long, large-grain phases in programs and predict the performance of the architecture based on its aggregate performance on the individual phases. This study attempts to compare accuracy of the SimPoint methodology for the SPEC CPU 2006 benchmark suite with that of SPEC CPU 2000 and to study the large-grain phases in the two benchmark suites using the SimPoint methodology. We find that there has not been a significant increase in the number of simulation points required to accurately predict the behavior of the programs in SPEC CPU 2006, despite its significantly larger data footprint and dynamic instruction count. We also find that the programs in both benchmark suites have similar characteristics in terms of the number of phases that contribute significantly towards overall behavior, further emphasizing the similarity between the two benchmark suites with respect to the number of simulation points required for similar accuracies.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125095867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterization and design of sequential circuit elements to combat soft error","authors":"H. Abrishami, S. Hatami, Massoud Pedram","doi":"10.1109/ICCD.2008.4751861","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751861","url":null,"abstract":"This paper performs analysis and design of latches and flip-flops while considering the effect of event upsets caused by energetic particle hits. First it is shown that the conventional analysis of this effect in sequential circuit elements (SCEs) tends to underestimate the threat posed by such events. More precisely, there exists a timing window close to the triggering edge of the clock during which a SCE is more vulnerable to the particle hit. This phenomenon has been ignored by previous work, resulting in false negatives. Next the paper explains how to size transistors of a familiar SCE i.e., a clocked CMOS latch, to make it more robust to such events. Experimental results to validate the characterization and transistor sizing steps are provided and discussed.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121302511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive techniques for leakage power management in L2 cache peripheral circuits","authors":"H. Homayoun, A. Veidenbaum, J. Gaudiot","doi":"10.1109/ICCD.2008.4751917","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751917","url":null,"abstract":"Recent studies indicate that a considerable amount of an L2 cache leakage power is dissipated in its peripheral circuits, e.g., decoders, word-lines and I/O drivers. In addition, L2 cache is becoming larger, thus increasing the leakage power. This paper proposes two adaptive architectural techniques (ADM and ASM) to reduce leakage in the L2 cache peripheral circuits. The adaptive techniques use the product of cache hierarchy miss rates to guide the leakage control in accordance with program behavior. The result for SPEC2K benchmarks show that the first technique (ASM) achieves a 34% average leakage power reduction with a 1.8% average IPC reduction. The second technique (ADM) achieves a 52% average savings with a 1.9% average IPC reduction. This corresponds to a 2 to 3 X improvement over recently proposed static techniques.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125804978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-silicon verification for cache coherence","authors":"A. DeOrio, Adam Bauserman, V. Bertacco","doi":"10.1109/ICCD.2008.4751884","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751884","url":null,"abstract":"Modern processor designs are extremely complex and difficult to validate during development, causing a growing portion of the verification effort to shift to post-silicon, after the first few hardware prototypes become available. Extremely slow simulation speeds during pre-silicon verification result in functional errors escaping into silicon, a problem that is further exacerbated by the growing complexity of the memory subsystem in multi-core platforms. In this work we present CoSMa, a novel technology offering high coverage functional post-silicon validation of cache coherence protocols in multi-core systems. It enables the detection and diagnosis of functional errors in the memory subsystem by recording at runtime a compact encoding of the operations occurring at each cache line and checking their correctness at regular intervals. We leverage the systempsilas existing memory resources to store the required activity, thus minimizing area overhead. When the system is finally ready for customer shipment, CoSMa can be completely disabled, eliminating any performance or memory overhead. We reproduce in our experiments a set of coherence protocol bugs based on published errata documents of commercial multi-core designs, and show that CoSMa is highly effective in detecting them.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123188581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}