Scalarization on Short Vector Machines

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI:10.1109/ISPASS.2005.1430573

Yuan Zhao, K. Kennedy

{"title":"Scalarization on Short Vector Machines","authors":"Yuan Zhao, K. Kennedy","doi":"10.1109/ISPASS.2005.1430573","DOIUrl":null,"url":null,"abstract":"Scalarization is a process that converts array statements into loop nests so that they can run on a scalar machine. One technical difficulty of scalarization is that temporary storage often needs to be allocated in order to preserve the semantics of array syntax - \"fetch before store\". Many techniques have been developed to reduce the size of temporary storage requirement in order to improve the memory hierarchy performance. With the emergence of short vector units on modern microprocessors, it is interesting to see how to extend the preexisting scalarization methods so that the underlying vector infrastructure is fully utilized, while at the same time keep the temporary storage minimized. In this paper, we extend a loop alignment algorithm for scalarization on short vector machines. The revised algorithm not only achieves vector execution with minimum temporary storage, but also handles data alignment properly, which is very important for performance. Our experiments on two types of widely available architectures demonstrate the effectiveness of our strategy","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2005.1430573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Scalarization is a process that converts array statements into loop nests so that they can run on a scalar machine. One technical difficulty of scalarization is that temporary storage often needs to be allocated in order to preserve the semantics of array syntax - "fetch before store". Many techniques have been developed to reduce the size of temporary storage requirement in order to improve the memory hierarchy performance. With the emergence of short vector units on modern microprocessors, it is interesting to see how to extend the preexisting scalarization methods so that the underlying vector infrastructure is fully utilized, while at the same time keep the temporary storage minimized. In this paper, we extend a loop alignment algorithm for scalarization on short vector machines. The revised algorithm not only achieves vector execution with minimum temporary storage, but also handles data alignment properly, which is very important for performance. Our experiments on two types of widely available architectures demonstrate the effectiveness of our strategy

查看原文本刊更多论文

短向量机的标量化

标量化是一个将数组语句转换为循环巢的过程，这样它们就可以在标量机器上运行。规模化的一个技术难题是，为了保持数组语法的语义——“先取后存”，经常需要分配临时存储。为了提高内存层次结构的性能，已经开发了许多减小临时存储需求大小的技术。随着现代微处理器上短向量单元的出现，如何扩展现有的标量化方法以充分利用底层向量基础设施，同时保持临时存储最小化，这是一件有趣的事情。在本文中，我们扩展了一种用于短向量机标量化的循环对齐算法。改进后的算法不仅可以在最小的临时存储空间内实现矢量执行，而且可以很好地处理数据对齐，这对性能非常重要。我们在两种广泛可用的体系结构上的实验证明了我们的策略的有效性

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.

自引率

0.00%

发文量