Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001最新文献

Adaptive interfacing with reconfigurable computers 与可重构计算机的自适应接口

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903347

N. Bergmann, Anwar S. Dawood

引用次数: 2

DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors DStride:多媒体处理器基于数据缓存缺失地址的跨步预取方案

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903360

G. Hariprakash, R. Achutharaman, A. Omondi

{"title":"DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors","authors":"G. Hariprakash, R. Achutharaman, A. Omondi","doi":"10.1109/ACAC.2001.903360","DOIUrl":"https://doi.org/10.1109/ACAC.2001.903360","url":null,"abstract":"Prefetching reduces cache miss latency by moving data up in memory hierarchy before they are actually needed. Recent hardware-based stride prefetching techniques mostly rely on the processor pipeline information (e.g. program counter and branch prediction table) for prediction. Continuing developments in processor microarchitecture drastically change core pipeline design and require that existing hardware-based stride prefetching techniques be adapted to the evolving new processor architectures. In this paper we present a new hardware-based stride prefetching technique, called DStride, that is independent of processor pipeline design changes. In this new design, the first-level data cache miss address stream is used for the stride prediction. The miss addresses are separated into load stream and store stream to increase the efficiency of the predictor. They are checked separately against the recent miss address stream to detect the strides. The detected steady strides are maintained in a table that also performs look-ahead stride prefetching when the processor stride reference rate is higher than the prefetch request service rate. We evaluated our design with multimedia workloads using execution-driven simulation with SimpleScalar toolset. Our experiments show that DStride is very effective in reducing overall pipeline stalls due to cache miss latency, especially for stride-intensive applications such as multimedia workloads.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"192 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120899124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Fault-tolerant routing on Complete Josephus Cubes 完全约瑟夫立方体上的容错路由

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903366

P. Loh, W. Hsu

引用次数: 5

Password-capabilities: their evolution from the password-capability system into Walnut and beyond 密码能力:从密码能力系统到核桃及以后的演变

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903370

R. Pose

{"title":"Password-capabilities: their evolution from the password-capability system into Walnut and beyond","authors":"R. Pose","doi":"10.1109/ACAC.2001.903370","DOIUrl":"https://doi.org/10.1109/ACAC.2001.903370","url":null,"abstract":"Since we first devised and defined password capabilities as a new technique for building capability-based operating systems, a number of research systems around the world have used them as the bases for a variety of operating systems. Our original Password-Capability System was implemented on custom built hardware with a novel address translation and protection scheme specifically designed to support password-capabilities. The password-capability concept later formed the basis of Opal developed at the University of Washington, and Mungi from the University of New South Wales, both of which used commercially available hardware. A second generation password-capability based system, Walnut, was developed at Monash University in the 1990s. Walnut was designed to run on commercially available hardware. It addressed some shortcomings of the original Password-Capability System but had to sacrifice some features that depended on hardware support. A third generation system that will extend Walnut to support mandatory security policies and other advanced features is currently being considered. This paper analyses the evolution of the Password-Capability System into Walnut, examines the shortcomings of the systems, and identifies issues to be addressed in the new system.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"1992 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128602650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Performance evaluation of a partial retraining scheme for defective multi-layer neural networks 缺陷多层神经网络部分再训练方案的性能评价

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903376

K. Yamamori, T. Abe, S. Horiguchi

{"title":"Performance evaluation of a partial retraining scheme for defective multi-layer neural networks","authors":"K. Yamamori, T. Abe, S. Horiguchi","doi":"10.1109/ACAC.2001.903376","DOIUrl":"https://doi.org/10.1109/ACAC.2001.903376","url":null,"abstract":"This paper addresses an efficient stuck-defect compensation scheme for multi-layer artificial neural networks implemented in hardware devices. To compensate for stuck defects, we have proposed a two-stage partial retraining scheme that adjusts weights belonging to a neuron affected by defects based on back-propagation (BP) algorithm between two layers. For input neurons, the partial retraining scheme is applied two times; first-stage between the input layer and the hidden layer, second-stage between the hidden layer and the output layer. The partial retraining scheme does not need any additional circuits if the hardware neural network has circuits for learning. In this paper we discuss the performance of the partial retraining scheme, retraining time, network yield and generalization ability. As a result, the partial retraining scheme could compensate the neuron stuck defects about 10 times faster than the whole network retraining by BP algorithm. In addition, yields of networks are also improved. The partial retraining scheme achieved more than 80% recognition ratio for noisy input patterns when 16% neurons of the network have 0-stuck or 1-stuck defects.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122548971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Application domains for fixed-length block structured architectures 定长块结构体系结构的应用领域

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903353

L. Eeckhout, T. Vander Aa, B. Goeman, H. Vandierendonck, R. Lauwereins, K. De Bosschere

{"title":"Application domains for fixed-length block structured architectures","authors":"L. Eeckhout, T. Vander Aa, B. Goeman, H. Vandierendonck, R. Lauwereins, K. De Bosschere","doi":"10.1109/ACAC.2001.903353","DOIUrl":"https://doi.org/10.1109/ACAC.2001.903353","url":null,"abstract":"In order to tackle the growing complexity and interconnects problem in modern microprocessor architectures, computer architects have come up with new architectural paradigms. A fixed-length block structured architecture (BSA) is one of these paradigms. The basic idea of a BSA is to generate blocks of instructions, called BSA-blocks, statically (by the compiler) and executing these blocks on a decentralized microarchitecture. In this paper, we focus on possible application domains for this architectural paradigm. To investigate this issue, we have set up several experiments with 43 benchmarks coming from the SPECint95, the SPECfp95, the MediaBench suite, plus a set of MPEG-4 like algorithms. The main conclusion of this paper is twofold. First, multimedia applications are less control-intensive than SPECint95 benchmarks and more control-intensive than SPECfp95 benchmarks. As a result, a compiler for a BSA will find more opportunities to fill BSA-blocks with instructions from the actually executed control flow paths for SPECfp95 than for multimedia applications; and more for multimedia applications than for SPECint95. Second, 16 instructions per BSA-block is appropriate for all application domains. Larger BSA-blocks on the other hand, result in higher branch misprediction rates for most applications and lead to a less effective use of the virtual window size.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128314101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The SawMill framework for virtual memory diversity 虚拟内存多样性的SawMill框架

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903345

M. Aron, J. Liedtke, Kevin Elphinstone, Yoonho Park, T. Jaeger, Luke Deller

引用次数: 31

Exploiting Java instruction/thread level parallelism with horizontal multithreading 利用水平多线程的Java指令/线程级并行性

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903373

Kenji Watanabe, Wanming Chu, Yamin Li

{"title":"Exploiting Java instruction/thread level parallelism with horizontal multithreading","authors":"Kenji Watanabe, Wanming Chu, Yamin Li","doi":"10.1109/ACAC.2001.903373","DOIUrl":"https://doi.org/10.1109/ACAC.2001.903373","url":null,"abstract":"Java bytecodes can be executed with the following three methods: a Java interpreter running on a particular machine interprets bytecodes; a Just-in-Time (JIT) compiler translates bytecodes to the native primitives of the particular machine and the machine executes the translated codes; and a Java processor executes bytecodes directly. The first two methods require no special hardware support for the execution of Java bytecodes and are widely used currently. The last method requires an embedded Java processor, picoJavaI or picoJavaII for instance. The picoJavaI and picoJavaII are simple pipelined processors with no ILP (instruction level parallelism) and TLP (thread level parallelism) supports. A so-called MAJC (microprocessor architecture for Java computing) design can exploit ILP and TLP by using a modified VLIW (very long instruction word) architecture and vertical multithreading technique, but it has its own instruction set and cannot execute Java bytecodes directly. In this paper, we investigate a processor architecture which can directly execute Java bytecodes meanwhile can exploit Java ILP and TLP simultaneously. The proposed processor consists of multiple slots implementing horizontal multithreading and multiple functional units shared by all threads executed in parallel. Our architectural simulation results show that the Java processor could achieve an average 20 IPC (instructions per cycle), or 7.33 EIPC (effective IPC), with 8 slots and a 4-instruction scheduling window for each slot. We also check other configurations and give the utilization of functional units as well as the performance improvement with various kinds of working loads.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132559843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines 利用微线程管道在芯片多处理器中实现高效的矢量指令集

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903363

C. Jesshope

引用次数: 29

Stacking them up: a comparison of virtual machines 将它们堆叠起来:虚拟机的比较

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI: 10.1109/ACAC.2001.903358

J. Gough

{"title":"Stacking them up: a comparison of virtual machines","authors":"J. Gough","doi":"10.1109/ACAC.2001.903358","DOIUrl":"https://doi.org/10.1109/ACAC.2001.903358","url":null,"abstract":"A popular trend in current software technology is to gain program portability by compiling programs to an intermediate form based on an abstract machine definition. Such approaches date back at least to the 1970s, but have achieved new impetus based on the current popularity of the programming language Java. Implementations of language Java compile programs to bytecodes understood by the Java Virtual Machine (JVM). More recently Microsoft have released preliminary details of their \".NET\" platform, which is based on an abstract machine superficially similar to the JVM. In each case program execution is normally mediated by a just in time compiler (JIT), although in principle interpretative execution is also possible. Although these two competing technologies share some common aims the objectives of the virtual machine designs are significantly different. In particular, the ease with which embedded systems might use small-footprint versions of these virtual machines depends on detailed properties of the machine definitions. In this study, a compiler was implemented which can produce output code that may be run on either the JVM or .NET platforms. The compiler is available in the public domain, and facilitates comparisons to be made both at compile time and at runtime.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115721481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49