2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献_第6页

Mamba: A scalable communication centric multi-threaded processor architecture Mamba:一个可扩展的以通信为中心的多线程处理器架构

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378652

Greg Chadwick, S. Moore

{"title":"Mamba: A scalable communication centric multi-threaded processor architecture","authors":"Greg Chadwick, S. Moore","doi":"10.1109/ICCD.2012.6378652","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378652","url":null,"abstract":"In this paper we describe Mamba, an architecture designed for multi-core systems. Mamba has two major aims: (i) make on-chip communication explicit to the programmer so they can optimize for it and (ii) support many threads and supply very lightweight communication and synchronization primitives for them. These aims are based on the observations that: (i) as feature sizes shrink, on-chip communication becomes relatively more expensive than computation and (ii) as we go increasingly multi-core we need highly scalable approaches to inter-thread communication and synchronization. We employ a network of processors where a given memory access will always go to the same cache, removing the need for a coherence protocol and allowing the program explicit control over all communication. A presence bit associated with each word provides a very lightweight, finegrained synchronization primitive. We demonstrate an FPGA implementation with micro-benchmarks of standard spinlock and FIFO implementations and show that presence bit based implementations provide more efficient locking, and lower latency FIFO communications compared to a conventional shared memory implementation whilst also requiring fewer memory accesses. We also show that Mamba performance is insensitive to total thread count, allowing the use of as many threads as desired.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132944411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Design methodology for sample preparation on digital microfluidic biochips 数字微流控生物芯片样品制备的设计方法

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378639

Yi-Ling Hsieh, Tsung-Yi Ho, K. Chakrabarty

{"title":"Design methodology for sample preparation on digital microfluidic biochips","authors":"Yi-Ling Hsieh, Tsung-Yi Ho, K. Chakrabarty","doi":"10.1109/ICCD.2012.6378639","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378639","url":null,"abstract":"Recent advances in digital microfluidic biochips have led to a promising future for miniaturized laboratories, with the associated advantages of high sensitivity and reconfigurability. As one of the front-end operations on digital microfluidic biochips, sample preparation plays an important role in biochemical assays and applications. For fast and high-throughput biochemical applications, it is critical to develop an automated design methodology for sample preparation. Prior work in this area does not provide solutions to the problem of design automation for sample preparation. Moreover, it is critical to ensure the correctness of droplets and recover from errors efficiently during sample preparation. Published work on error recovery is inefficient and impractical for sample preparation. Therefore, in this paper, we present an automated design methodology for sample preparation, including architectural synthesis, layout synthesis, and dynamic error recovery. The proposed algorithm is evaluated on real-life biochemical applications to demonstrate its effectiveness and efficiency. Compared to prior work, the proposed algorithm can achieve up to 48.39% reduction in sample preparation time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Architecture and design flow for a debug event distribution interconnect 调试事件分发互连的体系结构和设计流

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378676

A. Azevedo, B. Vermeulen, K. Goossens

引用次数: 1

Post-layout OPE-predicted redundant wire insertion for clock skew minimization 布局后ope预测冗余线插入时钟倾斜最小化

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378695

Jin-Tai Yan, Zhi-Wei Chen

引用次数: 1

Track assignment considering crosstalk-induced performance degradation 考虑串扰引起的性能下降的航迹分配

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378696

Qiong Zhao, Jiang Hu

引用次数: 2

Dynamic phase-based tuning for embedded systems using phase distance mapping 使用相位距离映射的嵌入式系统动态相位调优

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378653

Tosiron Adegbija, A. Gordon-Ross, Arslan Munir

引用次数: 8

Row buffer locality aware caching policies for hybrid memories 混合内存的行缓冲区位置感知缓存策略

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378661

Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, O. Mutlu

{"title":"Row buffer locality aware caching policies for hybrid memories","authors":"Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, O. Mutlu","doi":"10.1109/ICCD.2012.6378661","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378661","url":null,"abstract":"Phase change memory (PCM) is a promising technology that can offer higher capacity than DRAM. Unfortunately, PCM's access latency and energy are higher than DRAM's and its endurance is lower. Many DRAM-PCM hybrid memory systems use DRAM as a cache to PCM, to achieve the low access latency and energy, and high endurance of DRAM, while taking advantage of PCM's large capacity. A key question is what data to cache in DRAM to best exploit the advantages of each technology while avoiding its disadvantages as much as possible. We propose a new caching policy that improves hybrid memory performance and energy efficiency. Our observation is that both DRAM and PCM banks employ row buffers that act as a cache for the most recently accessed memory row. Accesses that are row buffer hits incur similar latencies (and energy consumption) in DRAM and PCM, whereas accesses that are row buffer misses incur longer latencies (and higher energy consumption) in PCM. To exploit this, we devise a policy that avoids accessing in PCM data that frequently causes row buffer misses because such accesses are costly in terms of both latency and energy. Our policy tracks the row buffer miss counts of recently used rows in PCM, and caches in DRAM the rows that are predicted to incur frequent row buffer misses. Our proposed caching policy also takes into account the high write latencies of PCM, in addition to row buffer locality. Compared to a conventional DRAM-PCM hybrid memory system, our row buffer locality-aware caching policy improves system performance by 14% and energy efficiency by 10% on data-intensive server and cloud-type workloads. The proposed policy achieves 31% performance gain over an all-PCM memory system, and comes within 29% of the performance of an allDRAM memory system (not taking PCM's capacity benefit into account) on evaluated workloads.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 197

Integration of correct-by-construction BIP models into the MetroII design space exploration flow 将按施工正确的BIP模型集成到MetroII设计空间探索流程中

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378688

Alena Simalatsar, Liangpeng Guo, M. Bozga, R. Passerone

引用次数: 3

SOLE: Speculative one-cycle load execution with scalability, high-performance and energy-efficiency SOLE:投机的单周期负载执行，具有可扩展性，高性能和能源效率

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378654

Zhen-Hao Zhang, Dong Tong, Xiaoyin Wang, Jiangfang Yi, Keyi Wang

{"title":"SOLE: Speculative one-cycle load execution with scalability, high-performance and energy-efficiency","authors":"Zhen-Hao Zhang, Dong Tong, Xiaoyin Wang, Jiangfang Yi, Keyi Wang","doi":"10.1109/ICCD.2012.6378654","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378654","url":null,"abstract":"Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor scalability and high energy consumption. Recently proposals only focus on improving the LSQ scalability to increase the in-flight instruction capacity, but with poor performance improvement and energy efficiency. This paper presents a novel speculative store-load forwarding mechanism, named SOLE (speculative one-cycle load execution)1. Firstly, SOLE uses address identifiers to determine the memory disambiguation, rather than the exact memory addresses as the traditional LSQ does. Since the address identifier is just simple hash from the address base and offset, the speculative store-load forwarding could be advanced earlier to reduce the load execution latency and avoid unnecessary energy consumption by filtering unnecessary accesses to the data cache. Secondly, SOLE enlarges the forwarding communication range by using SSN (store sequential number) to determine the age order between stores, which further improves the performance. Finally, the implementation of SOLE all uses set-associative structures that avoid the non-scalable problem of CAM-based LSQ. Experiments show that performance of SOLE outperforms the traditional LSQ by 13.57% in terms of performance, with only 75.2% execution energy consumption of the loads and stores.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126938621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A polynomial time flow for implementing free-choice Petri-nets 实现自由选择petri网的多项式时间流

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378645

Pavlos M. Mattheakis, C. Sotiriou, P. Beerel

{"title":"A polynomial time flow for implementing free-choice Petri-nets","authors":"Pavlos M. Mattheakis, C. Sotiriou, P. Beerel","doi":"10.1109/ICCD.2012.6378645","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378645","url":null,"abstract":"FSM and PTnet control models are pertinent in both software and hardware applications as both specification and implementation models. The state-based, monolithic FSM model is directly implementable in software or hardware, but cannot model concurrency without state explosion. Interacting FSM models have so far lacked the formal rigor for expressing the synchronising interactions between different FSMs. The event-based, PTnet model is able to model both concurrency and choice within the same model, however lacks a polynomial time flow to implementation, as current methods of exposing the event state space require a potentially exponential number of states. In this work, we present a polynomial complexity flow for transforming a Free-Choice PTnet into a new formalism for Interacting FSMs, i.e Multiple, Synchronised FSMs (MSFSMs), a compact Interacting FSMs model, potentially implementable using any existing monolithic FSM implementation method. We believe that such a flow can in the long term bridge the event and state-based models. We present execution time and state space results of exercising our flow on 25 large PTnet specifications, describing asynchronous control circuits, and contrast our results to the popular Petrify tool for PTnet state space exploration and circuit implementation. Our results indicate a very significant reduction in both state space size and execution time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124508465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2