{"title":"Improving inclusive cache performance with two-level eviction priority","authors":"Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, Xu Cheng","doi":"10.1109/ICCD.2012.6378668","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378668","url":null,"abstract":"Inclusive cache hierarchies are widely adopted in modern processors, since they can simplify the implementation of cache coherence. However, it sacrifices some performance to guarantee inclusion. Many recent intelligent management policies are proposed to improve the last-level cache (LLC) performance by evicting blocks with poor locality earlier. Unfortunately, they are inapplicable in inclusive LLCs. In this paper, we propose Two-level Eviction Priority (TEP) policy. Besides the eviction priority provided by the baseline replacement policy, TEP appends an additional high level of eviction priority to LLC blocks, which is decided at the insertion time and cannot be changed during their lifetime in the LLC. When blocks with high eviction priority are not in inner caches anymore, they get evicted from the LLC preferentially. Thus, the LLC can retain more useful blocks to improve performance. TEP can cooperate well with various baseline replacement policies. Our evaluation shows that TEP with NRU can improve the performance of inclusive LLCs significantly while requiring negligible extra storage. It also outperforms other recent proposals including QBS, DIP, and DRRIP.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116637529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Locating faults in application-dependent interconnects of SRAM based FPGAs","authors":"T. N. Kumar, H. Almurib, F. Lombardi","doi":"10.1109/ICCD.2012.6378678","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378678","url":null,"abstract":"This paper presents a new method for locating multiple faults in an interconnect following application testing of an FPGA. This method utilizes conditions related to the interconnect structure and in particular, the presence of paths of nets that are either disjoint or joint between the primary input and at least one primary output. They yield to a rather adaptive approach by which faults are hierarchically located using the walking-1 test set. The proposed method is not dependent on net ordering and is capable to locate multiple stuck-at and pairwise bridging faults. This process requires 1+log2 k test configurations for multiple stuck-at location and 2+2log2 k additional test configurations to locate more than one pair-wise bridging faults (where k denotes the maximum combinational depth). As validated by simulation for benchmark circuits (implemented on the Xilinx Virtex4), the proposed method results in a significant reduction in the number of configurations.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127032025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximizing crosstalk-induced slowdown during path delay test","authors":"Dibakar Gope, D. Walker","doi":"10.1109/ICCD.2012.6378635","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378635","url":null,"abstract":"In this paper, we present a timing-driven test generator to sensitize multiple aligned aggressors coupled to a delay-sensitive victim path to detect the combination of a delay spot defect and crosstalk-induced slowdown. The framework uses parasitic capacitance information, timing windows and crosstalk-induced delay estimates to screen out unaligned or ineffective aggressors coupled to a victim path, speeding up crosstalk pattern generation. In order to induce maximum crosstalk slowdown along a path, aggressors are prioritized based on their potential delay increase and timing alignment. The test generation engine introduces the concept of alignment-driven path sensitization to generate paths from inputs to coupled aggressor nets that meet timing alignment and direction requirements. In addition, two new crosstalk-driven dynamic test compaction algorithms are developed to control the increase in test pattern count. The proposed test generation algorithm is applied to ISCAS85 and ISCAS89 benchmark circuits. SPICE simulation results demonstrate the ability of the alignment-driven test generator to increase crosstalk-induced delays along victim paths.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ring oscillator physical unclonable function with multi level supply voltages","authors":"S. Mansouri, E. Dubrova","doi":"10.1109/ICCD.2012.6378703","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378703","url":null,"abstract":"In this paper we introduce a new type of Ring Oscillator PUF (RO-PUF) in which the inverters composing the ring oscillators can be supplied by independent voltages. This new RO-PUF can improve the reliability of the PUF in case of temperature variations.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Dall'Osso, G. Biccari, L. Giovannini, D. Bertozzi, L. Benini
{"title":"Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs","authors":"M. Dall'Osso, G. Biccari, L. Giovannini, D. Bertozzi, L. Benini","doi":"10.1109/ICCD.2012.6378615","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378615","url":null,"abstract":"The growing complexity of customizable embedded multi-processor architectures for digital media processing will soon require highly scalable network-on-chip based communication infrastructures. In this paper, we propose xpipes, a scalable and high-performance NoC architecture for multi-processor SoCs, consisting of soft macros that can be turned into instance-specific network components at instantiation time. The flexibility of its components allows our NoC to support both homogeneous and heterogeneous architectures. The interface with IP cores at the periphery of the network is standardized (OCP-based). Links can be pipelined with a flexible number of stages to decouple data introduction speed from worst-case link delay. Switches are lightweight and support reliable communication for arbitrary link pipeline depths (latency insensitive operation). xpipes has been described in synthesizable SystemC, at the cycle-accurate and signal-accurate level.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"306 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114395560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting microarchitectural redundancy for defect tolerance","authors":"P. Shivakumar, S. Keckler, C. R. Moore, D. Burger","doi":"10.1109/ICCD.2012.6378613","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378613","url":null,"abstract":"The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Methods based on defect detection and reduction may not offer a scalable solution due to cost of eliminating contaminants in the manufacturing process and increasing chip complexity. This paper proposes to use the inherent redundancy available in existing and future chip microarchitectures to improve yield and enable graceful performance degradation in fail-in-place systems. We introduce a new yield metric called performance averaged yield (Ypav) which accounts both for fully functional chips and those that exhibit some performance degradation. Our results indicate that at 250nm we are able to increase the Ypav of a uniprocessor with only redundant rows in its caches from a base value of 85% to 98% using microarchitectural redundancy. Given constant chip area, shrinking feature sizes increases fault susceptibility and reduces the base Ypav to 60% at 50nm, which exploiting microarchitectural redundancy then increases to 99.6%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129320264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power-sensitive multithreaded architecture","authors":"J. Seng, D. Tullsen, George Z. N. Cai","doi":"10.1109/ICCD.2012.6378610","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378610","url":null,"abstract":"The power consumption of microprocessors is becoming increasingly important in design decisions, not only in mobile processors, but also now in high-performance processors. Power-conscious design must therefore go beyond technology and low-level design, but also change the way modern processors are architected. A multithreading processor is attractive in the context of low-power or power-constrained devices for many of the same reasons that enable its high throughput. Primarily, it supplies extra parallelism via multiple threads, allowing the processor to rely much less heavily on speculation. We show that a simultaneous multithreading processor utilizes up to 22% less energy per instruction than a single-threaded architecture. We also explore other power optimizations that are particular to multithreaded architectures, either because they are unavailable to or unreasonable for single-thread architectures.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Kang, Wei Huang, Seung-Moon Yoo, D. Franklin, Zhenzhou Ge, V. Lam, P. Pattnaik, J. Torrellas
{"title":"FlexRAM: Toward an advanced Intelligent Memory system","authors":"Y. Kang, Wei Huang, Seung-Moon Yoo, D. Franklin, Zhenzhou Ge, V. Lam, P. Pattnaik, J. Torrellas","doi":"10.1109/ICCD.2012.6378608","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378608","url":null,"abstract":"Major advances in Merged Logic DRAM (MLD) technology coupled with the popularization of memory-intensive applications provide fertile ground for architectures based on Intelligent Memory (IRAM) or Processors-in-Memory (PIM). The contribution of this paper is to explore one way to use the current state-of-the-art MLD technology for general-purpose computers. To satisfy requirements of general purpose and low programming cost, we place the PIM chips in the memory system and let them default to plain DRAM if the application is not enabled for intelligent memory. Since wide usability is crucial, we identify and analyze a range of real applications for PIM. Based on the requirements of these applications and current technological constraints, we design a PIM chip and a PIM-based memory system. We call the chip FlexRAM. We describe FlexRAMs design and floorplan, and the resulting memory system. Evaluation of the system through simulations shows that 4 FlexRAM chips often allow a workstation to run 25-40 times faster.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131100535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}