{"title":"Extracting multi-thread with data localities for vector computers","authors":"J. Sheu, Chih-Yung Chang","doi":"10.1109/ICPADS.1994.590357","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a source-to-source compilation strategy to partition vectorized loop programs into multithread execution form. Each partitioned thread consists of instances of statements with localities in vector registers. The multi-threading scheme gives a novel combination of loop unrolling, statement instances reordering, index shifting, vector register reuse exploiting, and multi-threading. Experimental results show that our multithreading scheme assists vector compiler of Convex C38 series to reduce the number of memory accesses and the number of synchronizations among CPUs and usually obtains a better performance.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.1994.590357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a source-to-source compilation strategy to partition vectorized loop programs into multithread execution form. Each partitioned thread consists of instances of statements with localities in vector registers. The multi-threading scheme gives a novel combination of loop unrolling, statement instances reordering, index shifting, vector register reuse exploiting, and multi-threading. Experimental results show that our multithreading scheme assists vector compiler of Convex C38 series to reduce the number of memory accesses and the number of synchronizations among CPUs and usually obtains a better performance.