{"title":"基于并行直接法的混合精度迭代细化优化","authors":"T. Kouya","doi":"10.1109/ICEET56468.2022.10007230","DOIUrl":null,"url":null,"abstract":"Solving a linear system of equations is one of the most critical tasks in scientific computing, which can be performed using the LINPACK test to evaluate TOP500 supercomputers. We have already implemented SIMDized basic linear computation with AVX2 and confirmed that it performs well via benchmark tests in the x86-64 computing environment, demonstrating that SIMDized can be used to accelerate LU decomposition. In this study, it is further demonstrated that parallelized SIMDized LU decomposition with OpenMP is faster than the serial version, and that the mixed-precision iterative refinement used to obtain quad-double (QD, 212-bit mantissa) approximation is optimizable. As a result, the combination of double-double (DD, 106 bits mantissa) and QD arithmetic for the iterative refinement process is more efficient than the DDMPFR 212-bit combination.","PeriodicalId":241355,"journal":{"name":"2022 International Conference on Engineering and Emerging Technologies (ICEET)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimization of mixed-precision iterative refinement using parallelized direct methods\",\"authors\":\"T. Kouya\",\"doi\":\"10.1109/ICEET56468.2022.10007230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Solving a linear system of equations is one of the most critical tasks in scientific computing, which can be performed using the LINPACK test to evaluate TOP500 supercomputers. We have already implemented SIMDized basic linear computation with AVX2 and confirmed that it performs well via benchmark tests in the x86-64 computing environment, demonstrating that SIMDized can be used to accelerate LU decomposition. In this study, it is further demonstrated that parallelized SIMDized LU decomposition with OpenMP is faster than the serial version, and that the mixed-precision iterative refinement used to obtain quad-double (QD, 212-bit mantissa) approximation is optimizable. As a result, the combination of double-double (DD, 106 bits mantissa) and QD arithmetic for the iterative refinement process is more efficient than the DDMPFR 212-bit combination.\",\"PeriodicalId\":241355,\"journal\":{\"name\":\"2022 International Conference on Engineering and Emerging Technologies (ICEET)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Engineering and Emerging Technologies (ICEET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEET56468.2022.10007230\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Engineering and Emerging Technologies (ICEET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEET56468.2022.10007230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimization of mixed-precision iterative refinement using parallelized direct methods
Solving a linear system of equations is one of the most critical tasks in scientific computing, which can be performed using the LINPACK test to evaluate TOP500 supercomputers. We have already implemented SIMDized basic linear computation with AVX2 and confirmed that it performs well via benchmark tests in the x86-64 computing environment, demonstrating that SIMDized can be used to accelerate LU decomposition. In this study, it is further demonstrated that parallelized SIMDized LU decomposition with OpenMP is faster than the serial version, and that the mixed-precision iterative refinement used to obtain quad-double (QD, 212-bit mantissa) approximation is optimizable. As a result, the combination of double-double (DD, 106 bits mantissa) and QD arithmetic for the iterative refinement process is more efficient than the DDMPFR 212-bit combination.