Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors最新文献

筛选
英文 中文
An empirical study of datapath, memory hierarchy, and network in SIMD array architectures SIMD阵列架构中数据路径、内存层级和网络的实证研究
M. Herbordt, C. Weems
{"title":"An empirical study of datapath, memory hierarchy, and network in SIMD array architectures","authors":"M. Herbordt, C. Weems","doi":"10.1109/ICCD.1995.528921","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528921","url":null,"abstract":"Although SIMD arrays have been built for 30 years, they have as a class been the subject of few empirical design studies. Using ENPASSANT, a simulation environment developed for that purpose, we analyze several aspects of SIMD array architecture with respect to a test suite of spatially mapped applications. Several surprising results are obtained. With respect to memory hierarchy, we find that adding a level of cache to current PE designs is likely to be advantageous, but that such a cache will look quite different than expected. In particular, we find that associativity has unusual significance and that performance varies inversely with block size. Router network results indicate the importance of support for local transfers, broadcast, and reduction even at the expense of arbitrary permutations. Other communication results point to the appropriate dimensionality of k-ary n-cube networks (2 or 3), and the criticality of supporting bidirectional transfers, even if the overall bandwidth remains unchanged.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123139497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance estimation for real-time distributed embedded systems 实时分布式嵌入式系统的性能评估
Ti-Yen Yen, W. Wolf
{"title":"Performance estimation for real-time distributed embedded systems","authors":"Ti-Yen Yen, W. Wolf","doi":"10.1109/ICCD.1995.528792","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528792","url":null,"abstract":"Many embedded computing systems are distributed systems: communicating processes executing on several CPUs/ASICs connected by communication links. This paper describes a new, efficient analysis algorithm to derive tight bounds on the execution time required for an application task executing on a distributed system. Tight bounds are essential to cosynthesis algorithms. Our bounding algorithms are valid for a general problem model: the system can contain several tasks with different periods; each task is partitioned into a set of processes related by data dependencies; the periods and the computation times of processes are bounded but not necessarily constant. Experimental results show that our algorithm can find tight bounds in small amounts of CPU time.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"61 13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 139
Write buffer design for cache-coherent shared-memory multiprocessors 缓存一致共享内存多处理器的写缓冲区设计
F. Mounes-Toussi, D. Lilja
{"title":"Write buffer design for cache-coherent shared-memory multiprocessors","authors":"F. Mounes-Toussi, D. Lilja","doi":"10.1109/ICCD.1995.528915","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528915","url":null,"abstract":"We evaluate the performance impact of two different write-buffer configurations (one word per buffer entry and one block per buffer entry) and two different write policies (write-through and write-back), when using the partial block invalidation coherence mechanism in a shared-memory multiprocessor. Using an execution-driven simulator, we find that the one word per entry buffer configuration with a write-back policy is preferred for small write-buffer sizes when both buffers have an equal number of data words, and when they have equal hardware cost. Furthermore, when partial block invalidation is supported, we find that a write-through policy is preferred over a write-back policy due to its simpler cache hit detection mechanism, its elimination of write-back transactions, and its competitive-performance when the write-buffer is relatively large.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115072042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Minimal self-correcting shift counters 最小的自校正移位计数器
A.M. Tokarnia, A. Peterson
{"title":"Minimal self-correcting shift counters","authors":"A.M. Tokarnia, A. Peterson","doi":"10.1109/ICCD.1995.528925","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528925","url":null,"abstract":"In some applications of shift counters, self initialization is an advantage. It eliminates the need for complex initialization and guarantees the return to the original state sequence after a temporary failure. The low operating frequencies and large areas of the available self correcting shift counters, however, impose severe limitations to their use. This poor performance is partially due to a widely used design method. It consists of modifying the state diagram of a counter with the desired modulus until a single cycle is left. Due to the additional hardware required to change state transitions, the final circuit tends to be slow and large. The paper presents a technique for determining self correcting shift counters by selecting the feedback functions from a large set of functions. The set is searched for functions satisfying a minimization criterion. Self correcting shift counters with up to 10 stages have been determined. These counters are faster and smaller than the self correcting shift counters available from the literature. A table of self correcting shift counters with 6 stages is included in the paper.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122634417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Logic synthesis for a single large look-up table 单个大型查询表的逻辑综合
R. Murgai, M. Fujita, F. Hirose
{"title":"Logic synthesis for a single large look-up table","authors":"R. Murgai, M. Fujita, F. Hirose","doi":"10.1109/ICCD.1995.528842","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528842","url":null,"abstract":"Logic synthesis for look-up tables (LUTs) has received much attention in the past few years, since Xilinx introduced its LUT-based field-programmable gate array (FPGA) architectures. An m-input LUT can implement any Boolean function of up to m inputs. So the goal of synthesis for such architectures has been to synthesize a circuit in which each function can be implemented by one m-LUT such that either the total number of functions or the number of levels of the circuit is minimized. In this work, we focus on a different though related problem: synthesize the given circuit on a single memory or LUT L, which has a capacity of M bits. In addition to satisfying the memory constraint M, we also wish to minimize the total number of functions to be implemented. The main motivation for the problem comes from the problem of minimizing the simulation time on a hardware accelerator for logic simulation. This accelerator uses memory as a logic primitive. In fact, the problem is also relevant in the context of compile-code or software simulation. Another situation where the problem arises is in synthesis for the FPGA architectures being proposed that have on-chip memory for storing programs and data. The unused memory locations can be used to store logic functions. We show that the existing LUT synthesis methods are inadequate to solve this problem. We propose techniques to solve the problem and present experimental evidence to demonstrate their effectiveness.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128516578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Interrupt-based hardware support for profiling memory system performance 基于中断的硬件支持,用于分析内存系统性能
A. Goldberg, J. Trotter
{"title":"Interrupt-based hardware support for profiling memory system performance","authors":"A. Goldberg, J. Trotter","doi":"10.1109/ICCD.1995.528917","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528917","url":null,"abstract":"Fueled by higher clock rates and superscalar technologies, growth in processor speed continues to outpace improvement in memory system performance. Reflecting this trend, architects are developing increasingly complex memory hierarchies to mask the speed gap, compiler writers are adding locality enhancing transformations to better utilize complex memory hierarchies, and applications programmers are recoding their algorithms to exploit memory systems. All of these groups need empirical data on memory system behavior to guide their optimizations. This paper describes how to combine simple hardware support and sampling techniques to obtain such data without appreciably perturbing system performance. The idea is implemented in the Mprof prototype that profiles data stall cycles, first level cache misses, and second level misses on the Sun Sparc 10/41.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128606823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DART: delay and routability driven technology mapping for LUT based FPGAs 基于LUT的fpga的延迟和可路由性驱动技术映射
A. Lu, E. Dagless, J. Saul
{"title":"DART: delay and routability driven technology mapping for LUT based FPGAs","authors":"A. Lu, E. Dagless, J. Saul","doi":"10.1109/ICCD.1995.528841","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528841","url":null,"abstract":"A two-phased approach for routability directed delay-optimal mapping of LUT based FPGAs is presented based on the results of stochastic routability analysis. First, delay-optimal mapping is performed which simultaneously minimizes area and delay. Then, the mapped circuits are restructured to alleviate the potential routing congestions. Experimental results indicate that the first phase creates designs which require 17% fewer levels and 40% fewer LUTs than MIS-pga (delay), 11% fewer levels and 37% fewer LUTs than FlowMap-r, and 5% fewer levels and 39% fewer LUTs than TechMap-D. The success of the second phase is confirmed by running a vendor's layout tool APR. It is observed that they are more routable and have less final delays than those produced by other mappers if they are placed and routed.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124960093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Multiprocessor design verification for the PowerPC 620 microprocessor powerpc620微处理器的多处理器设计验证
C. Montemayor, M. Sullivan, Jen-Tien Yen, P. Wilson, R. Evers
{"title":"Multiprocessor design verification for the PowerPC 620 microprocessor","authors":"C. Montemayor, M. Sullivan, Jen-Tien Yen, P. Wilson, R. Evers","doi":"10.1109/ICCD.1995.528809","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528809","url":null,"abstract":"Multiprocessor design verification for the PowerPC 620 microprocessor was challenging due to the 620 Bus protocol complexity. The highly concurrent bus and level 2 (LS) cache interfaces, and the extensive system configurability. In order to verify this functionality, a combination of random and deterministic approaches were used. The Random Test Program Generator (RTPG) and the newly developed Stochastic Concurrent Program Generator (SCPG) tools were used for random verification. In the deterministic front, testcases in C were written to verify specific scenarios. In creating SCPG, we dealt with the design complexity and frequent design changes by abstracting areas of concern as simple languages, writing tools to generate tests, and executing these in the standard verification environment. The added value of these tests is that they exercise true data sharing among processors, are self-checking and resemble commercial multiprocessor code.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"46 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114116798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The PowerPC 603e microprocessor: an enhanced, low-power, superscalar microprocessor PowerPC 603e微处理器:一种增强型、低功耗、超标量微处理器
C. Montemayor, M. Sullivan, Jen-Tien Yen, P. Wilson, R. Evers, K. R. Kishore
{"title":"The PowerPC 603e microprocessor: an enhanced, low-power, superscalar microprocessor","authors":"C. Montemayor, M. Sullivan, Jen-Tien Yen, P. Wilson, R. Evers, K. R. Kishore","doi":"10.1109/ICCD.1995.528810","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528810","url":null,"abstract":"The PowerPC 603e microprocessor is a high performance, low cost, low power microprocessor designed for use in portable computers. The 603e is an enhanced version of the PowerPC 603 microprocessor and extends the performance range of the PowerPC microprocessor family of portable products. The enhancements include increasing the frequency to 100 MHZ doubling the on-chip instruction and data caches to 16 Kbytes each, increasing the cache associativity to 4-way set-associative, adding an extra integer unit, and increasing the throughput of stores and misaligned accesses. Three new bus modes are added to allow for more flexibility in system design. The estimated performance of the 603e at 100 MHz is 120 SPECint92 and 105 SPECfp92. The 603e is fabricated in the same 3.3 volt, 0.5 micron, four-level metal technology as the 603 and contains 2.6 million transistors. The die size is 98 mm/sup 2/. The typical power consumption of the 603e at 100 MHz is 3 watts. Like the 603, the 603e provides three software controllable power-down modes to further extend power saving capability.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125252690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SSM-MP: more scalability in shared-memory multi-processor SSM-MP:在共享内存多处理器中具有更高的可伸缩性
Shigeaki Iwasa, Shu Shing, Hisashi Mogi, Hiroshi Nozuwe, Hiroo Hayashi, Osamu Wakamori, Takashi Ohmizo, Kuninori Tanaka, H. Sakai, M. Saito
{"title":"SSM-MP: more scalability in shared-memory multi-processor","authors":"Shigeaki Iwasa, Shu Shing, Hisashi Mogi, Hiroshi Nozuwe, Hiroo Hayashi, Osamu Wakamori, Takashi Ohmizo, Kuninori Tanaka, H. Sakai, M. Saito","doi":"10.1109/ICCD.1995.528923","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528923","url":null,"abstract":"Bus-based shared-memory multi-processors (SM-MP) have successfully been used commercially, since implementation requires no drastic changes to the programming paradigm. In this paper we propose the memory structure called SSM-MP (Scalable shared-memory multi-processors), aimed to shorten the cache refill latency and to relax the bus bottle neck problem. In this machine, main memory consists of local memories dedicated to each of the processors and something called MTag. MTag is a small piece of hardware that filters out bus traffic headed to the system bus and maintains cache coherency. A popular UNIX (SVR4 ES/MP) was ported. Original OS code works well due to its natural locality. Furthermore, by allocating tasks to the local memory, we were able to reduce the system bus traffic to nearly a quarter. SSM-MP is an effective approach in building a multi-processor system with a medium number (4-32) of processors.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132507700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信