ASPLOS XII最新文献_第4页

A program transformation and architecture support for quantum uncomputation 量子反计算的程序转换与体系结构支持

ASPLOS XII Pub Date : 2006-10-23 DOI: 10.1145/1168857.1168889

E. Schuchman, T. N. Vijaykumar

{"title":"A program transformation and architecture support for quantum uncomputation","authors":"E. Schuchman, T. N. Vijaykumar","doi":"10.1145/1168857.1168889","DOIUrl":"https://doi.org/10.1145/1168857.1168889","url":null,"abstract":"Quantum computing's power comes from new algorithms that exploit quantum mechanical phenomena for computation. Quantum algorithms are different from their classical counterparts in that quantum algorithms rely on algorithmic structures that are simply not present in classical computing. Just as classical program transformations and architectures have been designed for common classical algorithm structures, quantum program transformations and quantum architectures should be designed with quantum algorithms in mind. Because quantum algorithms come with these new algorithmic structures, resultant quantum program transformations and architectures may look very different from their classical counterparts.This paper focuses on uncomputation, a critical and prevalent structure in quantum algorithms, and considers how program transformations, and architecture support should be designed to accommodate uncomputation. In this paper,we show a simple quantum program transformation that exposes independence between uncomputation and later computation. We then propose a multicore architecture tailored to this exposed parallelism and propose a scheduling policy that efficiently maps such parallelism to the multicore architecture. Our policy achieves parallelism between uncomputation and later computation while reducing cumulative communication distance. Our scheduling and architecture allows significant speedup of quantum programs (between 1.8x and 2.8x speedup in Shor's factoring algorithm), while reducing cumulative communication distance 26%.","PeriodicalId":270694,"journal":{"name":"ASPLOS XII","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128562913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A new idiom recognition framework for exploiting hardware-assist instructions 一种新的用于开发硬件辅助指令的习语识别框架

ASPLOS XII Pub Date : 2006-10-23 DOI: 10.1145/1168857.1168905

M. Kawahito, H. Komatsu, T. Moriyama, H. Inoue, T. Nakatani

{"title":"A new idiom recognition framework for exploiting hardware-assist instructions","authors":"M. Kawahito, H. Komatsu, T. Moriyama, H. Inoue, T. Nakatani","doi":"10.1145/1168857.1168905","DOIUrl":"https://doi.org/10.1145/1168857.1168905","url":null,"abstract":"Modern processors support hardware-assist instructions (such as TRT and TROT instructions on IBM zSeries) to accelerate certain functions such as delimiter search and character conversion. Such special instructions have often been used in high performance libraries, but they have not been exploited well in optimizing compilers except for some limited cases. We propose a new idiom recognition technique derived from a topological embedding algorithm [4] to detect idiom patterns in the input program more aggressively than in previous approaches. Our approach can detect a pattern even if the code segment does not exactly match the idiom. For example, we can detect a code segment that includes additional code within the idiom pattern. We implemented our new idiom recognition approach based on the Java Just-In-Time (JIT) compiler that is part of the J9 Java Virtual Machine, and we supported several important idioms for special hardware-assist instructions on the IBM zSeries and on some models of the IBM pSeries. To demonstrate the effectiveness of our technique, we performed two experiments. The first one is to see how many more patterns we can detect compared to the previous approach. The second one is to see how much performance improvement we can achieve over the previous approach. For the first experiment, we used the Java Compatibility Kit (JCK) API tests. For the second one we used IBM XML parser, SPECjvm98, and SPCjbb2000. In summary, relative to a baseline implementation using exact pattern matching, our algorithm converted 75% more loops in JCK tests. We also observed significant performance improvement of the XML parser by 64%, of SPECjvm98 by 1%, and of SPECjbb2000 by 2% on average on a z990. Finally, we observed the JIT compilation time increases by only 0.32% to 0.44%.","PeriodicalId":270694,"journal":{"name":"ASPLOS XII","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133683597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Stealth prefetching 隐形预取

ASPLOS XII Pub Date : 2006-10-23 DOI: 10.1145/1168857.1168892

J. F. Cantin, Mikko H. Lipasti, James E. Smith

引用次数: 48

A comparison of software and hardware techniques for x86 virtualization x86虚拟化的软件和硬件技术的比较

ASPLOS XII Pub Date : 2006-10-23 DOI: 10.1145/1168857.1168860

Keith Adams, Ole Agesen

引用次数: 765

Tartan: evaluating spatial computation for whole program execution 格子:评估整个程序执行的空间计算

ASPLOS XII Pub Date : 2006-10-23 DOI: 10.1145/1168857.1168878

M. Mishra, T. Callahan, Tiberiu Chelcea, Girish Venkataramani, S. Goldstein, M. Budiu

{"title":"Tartan: evaluating spatial computation for whole program execution","authors":"M. Mishra, T. Callahan, Tiberiu Chelcea, Girish Venkataramani, S. Goldstein, M. Budiu","doi":"10.1145/1168857.1168878","DOIUrl":"https://doi.org/10.1145/1168857.1168878","url":null,"abstract":"Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system.Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energyefficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.","PeriodicalId":270694,"journal":{"name":"ASPLOS XII","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129370523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 111

A spatial path scheduling algorithm for EDGE architectures 基于EDGE架构的空间路径调度算法

ASPLOS XII Pub Date : 2006-10-23 DOI: 10.1145/1168857.1168875

Katherine E. Coons, Xia Chen, D. Burger, K. McKinley, Sundeep K. Kushwaha

引用次数: 62

Introspective 3D chips 内省3D芯片

ASPLOS XII Pub Date : 2006-10-20 DOI: 10.1145/1168857.1168890

Shashidhar Mysore, B. Agrawal, N. Srivastava, Sheng-Chih Lin, K. Banerjee, T. Sherwood

{"title":"Introspective 3D chips","authors":"Shashidhar Mysore, B. Agrawal, N. Srivastava, Sheng-Chih Lin, K. Banerjee, T. Sherwood","doi":"10.1145/1168857.1168890","DOIUrl":"https://doi.org/10.1145/1168857.1168890","url":null,"abstract":"While the number of transistors on a chip increases exponentially over time, the productivity that can be realized from these systems has not kept pace. To deal with the complexity of modern systems, software developers are increasingly dependent on specialized development tools such as security profilers, memory leak identifiers, data flight recorders, and dynamic type analysis. Many of these tools require full-system data which covers multiple interacting threads, processes, and processors. Reducing the performance penalty and complexity of these software tools is critical to those developing next generation applications, and many researchers have proposed adding specialized hardware to assist in profiling and introspection. Unfortunately, while this additional hardware would be incredibly beneficial to developers, the cost of this hardware must be paid on every single die that is manufactured.In this paper, we argue that a new way to attack this problem is with the addition of specialized analysis hardware built on separate active layers stacked vertically on the processor die using 3D IC technology. This provides a modular \"snap-on\" functionality that could be included with developer systems, and omitted from consumer systems to keep the cost impact to a minimum. In this paper we describe the advantage of using inter-die vias for introspection and we quantify the impact they can have in terms of the area, power, temperature, and routability of the resulting systems. We show that hardware stubs could be inserted into commodity processors at design time that would allow analysis layers to be bonded to development chips, and that these stubs would increase area and power by no more than 0.021mm2 and 0.9% respectively.","PeriodicalId":270694,"journal":{"name":"ASPLOS XII","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127421200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58