{"title":"基于日立 SR2201 滑动寄存器窗口架构的不规则细粒度并行计算","authors":"A. Smyk, M. Tudruj","doi":"10.1109/PCEE.2002.1115194","DOIUrl":null,"url":null,"abstract":"In this article, an optimization method for parallelized execution of irregular fine grain computations is presented. This method was implemented using pseudo-vector processing (PVP) and sliding window register (SWR) mechanisms, which have been provided in Hitachi SR2201 supercomputer. The general idea of PVP and SWR relies on optimizing access to big continuous parts of memory and parallel execution of three kinds of operations placed in loops: loading and storing data, arithmetic operations. The main disadvantage of the above-mentioned mechanisms are that gain can be obtained only for long loops and regular expressions inside them. In our method, we focused attention on irregular computations, devoid of any predictable dependencies. We divided a given code into parts and manually optimized relations between loading and storing operations with taking into consideration the memory latency and delays in accessing needed data. In our implementation we obtained a speedup by using a simple reordering of sequences access operations to registers and memory.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Irregular fine-grain parallel computing based on the slide register window architecture of Hitachi SR2201\",\"authors\":\"A. Smyk, M. Tudruj\",\"doi\":\"10.1109/PCEE.2002.1115194\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, an optimization method for parallelized execution of irregular fine grain computations is presented. This method was implemented using pseudo-vector processing (PVP) and sliding window register (SWR) mechanisms, which have been provided in Hitachi SR2201 supercomputer. The general idea of PVP and SWR relies on optimizing access to big continuous parts of memory and parallel execution of three kinds of operations placed in loops: loading and storing data, arithmetic operations. The main disadvantage of the above-mentioned mechanisms are that gain can be obtained only for long loops and regular expressions inside them. In our method, we focused attention on irregular computations, devoid of any predictable dependencies. We divided a given code into parts and manually optimized relations between loading and storing operations with taking into consideration the memory latency and delays in accessing needed data. In our implementation we obtained a speedup by using a simple reordering of sequences access operations to registers and memory.\",\"PeriodicalId\":444003,\"journal\":{\"name\":\"Proceedings. International Conference on Parallel Computing in Electrical Engineering\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Parallel Computing in Electrical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PCEE.2002.1115194\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCEE.2002.1115194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Irregular fine-grain parallel computing based on the slide register window architecture of Hitachi SR2201
In this article, an optimization method for parallelized execution of irregular fine grain computations is presented. This method was implemented using pseudo-vector processing (PVP) and sliding window register (SWR) mechanisms, which have been provided in Hitachi SR2201 supercomputer. The general idea of PVP and SWR relies on optimizing access to big continuous parts of memory and parallel execution of three kinds of operations placed in loops: loading and storing data, arithmetic operations. The main disadvantage of the above-mentioned mechanisms are that gain can be obtained only for long loops and regular expressions inside them. In our method, we focused attention on irregular computations, devoid of any predictable dependencies. We divided a given code into parts and manually optimized relations between loading and storing operations with taking into consideration the memory latency and delays in accessing needed data. In our implementation we obtained a speedup by using a simple reordering of sequences access operations to registers and memory.