2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献

筛选
英文 中文
Mamba: A scalable communication centric multi-threaded processor architecture Mamba:一个可扩展的以通信为中心的多线程处理器架构
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378652
Greg Chadwick, S. Moore
{"title":"Mamba: A scalable communication centric multi-threaded processor architecture","authors":"Greg Chadwick, S. Moore","doi":"10.1109/ICCD.2012.6378652","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378652","url":null,"abstract":"In this paper we describe Mamba, an architecture designed for multi-core systems. Mamba has two major aims: (i) make on-chip communication explicit to the programmer so they can optimize for it and (ii) support many threads and supply very lightweight communication and synchronization primitives for them. These aims are based on the observations that: (i) as feature sizes shrink, on-chip communication becomes relatively more expensive than computation and (ii) as we go increasingly multi-core we need highly scalable approaches to inter-thread communication and synchronization. We employ a network of processors where a given memory access will always go to the same cache, removing the need for a coherence protocol and allowing the program explicit control over all communication. A presence bit associated with each word provides a very lightweight, finegrained synchronization primitive. We demonstrate an FPGA implementation with micro-benchmarks of standard spinlock and FIFO implementations and show that presence bit based implementations provide more efficient locking, and lower latency FIFO communications compared to a conventional shared memory implementation whilst also requiring fewer memory accesses. We also show that Mamba performance is insensitive to total thread count, allowing the use of as many threads as desired.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132944411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Design methodology for sample preparation on digital microfluidic biochips 数字微流控生物芯片样品制备的设计方法
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378639
Yi-Ling Hsieh, Tsung-Yi Ho, K. Chakrabarty
{"title":"Design methodology for sample preparation on digital microfluidic biochips","authors":"Yi-Ling Hsieh, Tsung-Yi Ho, K. Chakrabarty","doi":"10.1109/ICCD.2012.6378639","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378639","url":null,"abstract":"Recent advances in digital microfluidic biochips have led to a promising future for miniaturized laboratories, with the associated advantages of high sensitivity and reconfigurability. As one of the front-end operations on digital microfluidic biochips, sample preparation plays an important role in biochemical assays and applications. For fast and high-throughput biochemical applications, it is critical to develop an automated design methodology for sample preparation. Prior work in this area does not provide solutions to the problem of design automation for sample preparation. Moreover, it is critical to ensure the correctness of droplets and recover from errors efficiently during sample preparation. Published work on error recovery is inefficient and impractical for sample preparation. Therefore, in this paper, we present an automated design methodology for sample preparation, including architectural synthesis, layout synthesis, and dynamic error recovery. The proposed algorithm is evaluated on real-life biochemical applications to demonstrate its effectiveness and efficiency. Compared to prior work, the proposed algorithm can achieve up to 48.39% reduction in sample preparation time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Architecture and design flow for a debug event distribution interconnect 调试事件分发互连的体系结构和设计流
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378676
A. Azevedo, B. Vermeulen, K. Goossens
{"title":"Architecture and design flow for a debug event distribution interconnect","authors":"A. Azevedo, B. Vermeulen, K. Goossens","doi":"10.1109/ICCD.2012.6378676","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378676","url":null,"abstract":"In this paper, we describe and analyze the architecture of the proposed Debug Event Distribution Interconnect (EDI). The EDI transmits debug events, which are 1-bit signals, between debug entities in different areas of the Network-on-Chip based Multi-Processor System-on-Chip. The EDI replicates the NoC topology with an EDI node instantiated for each underlying NoC data module. Contention in the EDI node is handled by replicating the EDI in layers. The EDI generation is automatic, and uses as input the cross-triggering patterns that are not required to follow the communication patterns in the NoC. The generation and routing tool is also presented in this paper. The EDI is evaluated with four different implementations varying complexity and handling of contention. The area of a single EDI Layer is around 0.9% of the area occupied by the tested NoCs, using the lower area implementation. These results show that the proposed implementation of the EDI incurs low cost on the overall system.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122934992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Post-layout OPE-predicted redundant wire insertion for clock skew minimization 布局后ope预测冗余线插入时钟倾斜最小化
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378695
Jin-Tai Yan, Zhi-Wei Chen
{"title":"Post-layout OPE-predicted redundant wire insertion for clock skew minimization","authors":"Jin-Tai Yan, Zhi-Wei Chen","doi":"10.1109/ICCD.2012.6378695","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378695","url":null,"abstract":"Based on the equilibrium concept of inserting load in a physical balance, the insertion of redundant wires can be used to minimize the clock skew in an OPE-predicted clock tree. For five tested benchmarks, the experimental results show that our proposed algorithm only increases 2.8% of the total load on the average for the insertion of OPE-predicted redundant wires and decreases 30.85 ps of the clock skew on the average to obtain the near zero-skew result in reasonable CPU time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126038018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Track assignment considering crosstalk-induced performance degradation 考虑串扰引起的性能下降的航迹分配
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378696
Qiong Zhao, Jiang Hu
{"title":"Track assignment considering crosstalk-induced performance degradation","authors":"Qiong Zhao, Jiang Hu","doi":"10.1109/ICCD.2012.6378696","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378696","url":null,"abstract":"Track assignment is a critical step between global routing and detailed routing in modern VLSI chip designs. Crosstalk, which is largely decided by wire adjacency, has significant impact on interconnect delay and circuit performance. Therefore, the amount of crosstalk should be restrained in order to satisfy timing constraints. In this work, a novel track assignment algorithm is proposed to reduce crosstalk-induced performance degradation. The problem is formulated as a Traveling Salesman Problem (TSP) and solved by a graph-based heuristic. Experimental results on the ISPD2011 benchmark circuits show that the violations on crosstalk bounds can be reduced by up to 99.56% compared to the conventional non-constraint-based heuristics.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121855916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic phase-based tuning for embedded systems using phase distance mapping 使用相位距离映射的嵌入式系统动态相位调优
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378653
Tosiron Adegbija, A. Gordon-Ross, Arslan Munir
{"title":"Dynamic phase-based tuning for embedded systems using phase distance mapping","authors":"Tosiron Adegbija, A. Gordon-Ross, Arslan Munir","doi":"10.1109/ICCD.2012.6378653","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378653","url":null,"abstract":"Phase-based tuning specializes a system's tunable parameters to the varying runtime requirements of an application's different phases of execution to meet optimization goals. Since the design space for tunable systems can be very large, one of the major challenges in phase-based tuning is determining the best configuration for each phase without incurring significant tuning overhead (e.g., energy and/or performance) during design space exploration. In this paper, we propose phase distance mapping, which directly determines the best configuration for a phase, thereby eliminating design space exploration. Phase distance mapping applies the correlation between a known phase's characteristics and best configuration to determine a new phase's best configuration based on the new phase's characteristics. Experimental results verify that our phase distance mapping approach determines configurations within 3% of the optimal configurations on average and yields an energy delay product savings of 26% on average.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114288113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Row buffer locality aware caching policies for hybrid memories 混合内存的行缓冲区位置感知缓存策略
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378661
Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, O. Mutlu
{"title":"Row buffer locality aware caching policies for hybrid memories","authors":"Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, O. Mutlu","doi":"10.1109/ICCD.2012.6378661","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378661","url":null,"abstract":"Phase change memory (PCM) is a promising technology that can offer higher capacity than DRAM. Unfortunately, PCM's access latency and energy are higher than DRAM's and its endurance is lower. Many DRAM-PCM hybrid memory systems use DRAM as a cache to PCM, to achieve the low access latency and energy, and high endurance of DRAM, while taking advantage of PCM's large capacity. A key question is what data to cache in DRAM to best exploit the advantages of each technology while avoiding its disadvantages as much as possible. We propose a new caching policy that improves hybrid memory performance and energy efficiency. Our observation is that both DRAM and PCM banks employ row buffers that act as a cache for the most recently accessed memory row. Accesses that are row buffer hits incur similar latencies (and energy consumption) in DRAM and PCM, whereas accesses that are row buffer misses incur longer latencies (and higher energy consumption) in PCM. To exploit this, we devise a policy that avoids accessing in PCM data that frequently causes row buffer misses because such accesses are costly in terms of both latency and energy. Our policy tracks the row buffer miss counts of recently used rows in PCM, and caches in DRAM the rows that are predicted to incur frequent row buffer misses. Our proposed caching policy also takes into account the high write latencies of PCM, in addition to row buffer locality. Compared to a conventional DRAM-PCM hybrid memory system, our row buffer locality-aware caching policy improves system performance by 14% and energy efficiency by 10% on data-intensive server and cloud-type workloads. The proposed policy achieves 31% performance gain over an all-PCM memory system, and comes within 29% of the performance of an allDRAM memory system (not taking PCM's capacity benefit into account) on evaluated workloads.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 197
Integration of correct-by-construction BIP models into the MetroII design space exploration flow 将按施工正确的BIP模型集成到MetroII设计空间探索流程中
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378688
Alena Simalatsar, Liangpeng Guo, M. Bozga, R. Passerone
{"title":"Integration of correct-by-construction BIP models into the MetroII design space exploration flow","authors":"Alena Simalatsar, Liangpeng Guo, M. Bozga, R. Passerone","doi":"10.1109/ICCD.2012.6378688","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378688","url":null,"abstract":"Design correctness and performance are major issues which are usually considered separately, and with different emphasis, by traditional system design flows. In this paper we show that one can meaningfully connect and benefit from the advantages of two design frameworks, with different design goals. We consider BIP for high-level rigorous design and correct-by-construction implementation, and metroII, for low-level platform-based design and performance evaluation.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114251520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SOLE: Speculative one-cycle load execution with scalability, high-performance and energy-efficiency SOLE:投机的单周期负载执行,具有可扩展性,高性能和能源效率
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378654
Zhen-Hao Zhang, Dong Tong, Xiaoyin Wang, Jiangfang Yi, Keyi Wang
{"title":"SOLE: Speculative one-cycle load execution with scalability, high-performance and energy-efficiency","authors":"Zhen-Hao Zhang, Dong Tong, Xiaoyin Wang, Jiangfang Yi, Keyi Wang","doi":"10.1109/ICCD.2012.6378654","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378654","url":null,"abstract":"Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor scalability and high energy consumption. Recently proposals only focus on improving the LSQ scalability to increase the in-flight instruction capacity, but with poor performance improvement and energy efficiency. This paper presents a novel speculative store-load forwarding mechanism, named SOLE (speculative one-cycle load execution)1. Firstly, SOLE uses address identifiers to determine the memory disambiguation, rather than the exact memory addresses as the traditional LSQ does. Since the address identifier is just simple hash from the address base and offset, the speculative store-load forwarding could be advanced earlier to reduce the load execution latency and avoid unnecessary energy consumption by filtering unnecessary accesses to the data cache. Secondly, SOLE enlarges the forwarding communication range by using SSN (store sequential number) to determine the age order between stores, which further improves the performance. Finally, the implementation of SOLE all uses set-associative structures that avoid the non-scalable problem of CAM-based LSQ. Experiments show that performance of SOLE outperforms the traditional LSQ by 13.57% in terms of performance, with only 75.2% execution energy consumption of the loads and stores.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126938621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A polynomial time flow for implementing free-choice Petri-nets 实现自由选择petri网的多项式时间流
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378645
Pavlos M. Mattheakis, C. Sotiriou, P. Beerel
{"title":"A polynomial time flow for implementing free-choice Petri-nets","authors":"Pavlos M. Mattheakis, C. Sotiriou, P. Beerel","doi":"10.1109/ICCD.2012.6378645","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378645","url":null,"abstract":"FSM and PTnet control models are pertinent in both software and hardware applications as both specification and implementation models. The state-based, monolithic FSM model is directly implementable in software or hardware, but cannot model concurrency without state explosion. Interacting FSM models have so far lacked the formal rigor for expressing the synchronising interactions between different FSMs. The event-based, PTnet model is able to model both concurrency and choice within the same model, however lacks a polynomial time flow to implementation, as current methods of exposing the event state space require a potentially exponential number of states. In this work, we present a polynomial complexity flow for transforming a Free-Choice PTnet into a new formalism for Interacting FSMs, i.e Multiple, Synchronised FSMs (MSFSMs), a compact Interacting FSMs model, potentially implementable using any existing monolithic FSM implementation method. We believe that such a flow can in the long term bridge the event and state-based models. We present execution time and state space results of exercising our flow on 25 large PTnet specifications, describing asynchronous control circuits, and contrast our results to the popular Petrify tool for PTnet state space exploration and circuit implementation. Our results indicate a very significant reduction in both state space size and execution time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124508465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信