Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors最新文献

筛选
英文 中文
Improving processor performance by simplifying and bypassing trivial computations 通过简化和绕过琐碎的计算来提高处理器性能
J. Yi, D. Lilja
{"title":"Improving processor performance by simplifying and bypassing trivial computations","authors":"J. Yi, D. Lilja","doi":"10.1109/ICCD.2002.1106814","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106814","url":null,"abstract":"During the course of a program's execution, a processor performs mangy trivial computations; that is, computations that can be simplified or where the result is zero, one, or equal to one of the input operands. This paper shows that, despite compiling a program with aggressive optimizations (-O3), approximately 30% of all arithmetic instructions, which account for 12% of all dynamic instructions, are trivial computations. The amount of trivial computation is not heavily dependent on the program's specific input values. Our results show that eliminating trivial computations dynamically at run-time yields an average speedup of 8% for a typical processor. Even for a very aggressive processor (i.e. one with no functional unit constraints), the average speedup is still 6%. It also is important to note that the area cost (i.e. hardware) required to dynamically detect and eliminate these trivial computations is very low, consisting of only a few comparators and multiplexers.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"42 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129188094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
A low energy set-associative I-Cache with extended BTB 具有扩展BTB的低能集关联I-Cache
Koji Inoue, V. Moshnyaga, K. Murakami
{"title":"A low energy set-associative I-Cache with extended BTB","authors":"Koji Inoue, V. Moshnyaga, K. Murakami","doi":"10.1109/ICCD.2002.1106768","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106768","url":null,"abstract":"This paper proposes a low-energy instruction-cache architecture, called history-based tag-comparison (HBTC) cache. The HBTC cache attempts to re-use tag-comparison results for avoiding unnecessary way activation in set-associative caches. The cache records tag-comparison results in an extended BTB, and re-uses them for directly selecting only the hit-way which includes the target instruction. In our simulation, it is observed that the HBTC cache can achieve 62% of energy reduction, with less than 1% performance degradation, compared with a conventional cache.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121193261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Low-power, high-speed CMOS VLSI design 低功耗、高速CMOS VLSI设计
T. Kuroda
{"title":"Low-power, high-speed CMOS VLSI design","authors":"T. Kuroda","doi":"10.1109/ICCD.2002.1106787","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106787","url":null,"abstract":"Ubiquitous computing is a next generation information technology where computers and communications will be scaled further, merged together, and materialized in consumer applications. Computers will be invisible behind broadband networks as servers, while terminals will come closer to us as wearable/implantable devices, more friendly devices with sophisticated human-computer interactions. IC chips will be implanted everywhere so that things can think and talk for distributed information processing. Key technologies here are low power, low cost, and good interfaces, especially for wireless data communications. Low-power, high-speed CMOS circuit techniques are presented in this paper, including low-voltage design with variable/multiple V/sub DD//V/sub TH/ control, embedded memory technology for reducing capacitance, and low-switching activity design.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128982781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Subword sorting with versatile permutation instructions 子字排序与通用的排列指令
Z. Shi, R. Lee
{"title":"Subword sorting with versatile permutation instructions","authors":"Z. Shi, R. Lee","doi":"10.1109/ICCD.2002.1106776","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106776","url":null,"abstract":"Subword parallelism has succeeded in accelerating many multimedia applications. Subword permutation instructions have been proposed to efficiently rearrange subwords in or among registers. Bit-level permutation instructions have also been proposed recently for their importance in cryptography. However, important algorithms, especially those with many conditional control dependencies such as sorting, have not exploited the advantage of subword parallel instructions. In this paper, we show how one of the bit permutation instructions, GRP, can be used for fast sorting. In the process, we demonstrate the versatility of this permutation instruction for uses other than bit permutations. This versatility is important in considering the addition of a new instruction to a general-purpose processor. The results show that our sorting methods have a significant speedup even when compared with the fastest sorting algorithms. We also discuss the hardware implementation of the GRP instruction and compare its latency to a typical processor's cycle time.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129732400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Locating tiny sensors in time and space: a case study 在时间和空间中定位微型传感器:一个案例研究
Lewis Girod, Vladimir Bychkovskiy, J. Elson, D. Estrin
{"title":"Locating tiny sensors in time and space: a case study","authors":"Lewis Girod, Vladimir Bychkovskiy, J. Elson, D. Estrin","doi":"10.1109/ICCD.2002.1106773","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106773","url":null,"abstract":"As the cost of embedded sensors and actuators drops, new applications will arise that exploit high density networks of small devices capable of a variety of sensing tasks. Although individual devices may have limited functionality, the true value of the system comes from the emergent behavior that arises when data from many places in the system is combined. This type of data fusion has a number of requirements, but two of the most important are: 1) synchronized time, precise enough to resolve movement in the sensed phenomenon (e.g., sound); and 2) known geographic locations, on a similar scale to the sensors' size and deployment density. However, the installation cost of a localization system with sufficient granularity is considerable, because of the large amount of effort required to deploy such a system and make all the measurements required to tune it. In this paper, we describe a system based on COTS components that incorporates our novel time synchronization and acoustic ranging techniques. The result is a low-cost, readily available platform for distributed, coherent signal processing.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 287
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信