{"title":"稀疏矩阵LU分解的区域高效高吞吐量架构实现","authors":"G. P. Kumar, Chinthala Ramesh","doi":"10.1109/IEMENTech48150.2019.8981319","DOIUrl":null,"url":null,"abstract":"In many scientific computations, Lower-upper (LU) decomposition is an important computing step, as most of the scientific applications are modeled using linear equations Ax=b. The Linear equations are used in our everyday life applications such as profit prediction in the business, income over time, mileage rate calculation. The complexity of data makes difficult to parallelize the LU decomposition. Because parallelization of LU decomposition improves the speed of solving LU factorization and reduces the delay in critical applications range from weather forecasting to power system problems-load flow computation. Field Programmable Gate Array (FPGA) is having more logic resources and parallel computing to speed up the matrix decomposition. In this work an area efficient High Throughput Architecture is designed for Sparse Matrix LU factorization by changing/modifying the computing steps in algorithm. The problem with the KLU algorithm is it occupies more area and the throughput is less when compared with the modified KLU algorithm. The area is reduced by the 10%. The hardware complexity of implementation of sparse LU Factorization on FPGA is 15% less when compared with CPU & GPU [4] and also the computing efficiency i.e., throughput 10% to 12% on GPU&CPU do not reach theoretical computing efficiency (theoretical peak throughput).The hardware efficiency (typically 1 to 4%) of UMFPACK and SuperLU, are very less due to poor utilization of Floating point.","PeriodicalId":243805,"journal":{"name":"2019 3rd International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech)","volume":"194 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Implementation of an Area Efficient High Throughput Architecture for Sparse Matrix LU Factorization\",\"authors\":\"G. P. Kumar, Chinthala Ramesh\",\"doi\":\"10.1109/IEMENTech48150.2019.8981319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many scientific computations, Lower-upper (LU) decomposition is an important computing step, as most of the scientific applications are modeled using linear equations Ax=b. The Linear equations are used in our everyday life applications such as profit prediction in the business, income over time, mileage rate calculation. The complexity of data makes difficult to parallelize the LU decomposition. Because parallelization of LU decomposition improves the speed of solving LU factorization and reduces the delay in critical applications range from weather forecasting to power system problems-load flow computation. Field Programmable Gate Array (FPGA) is having more logic resources and parallel computing to speed up the matrix decomposition. In this work an area efficient High Throughput Architecture is designed for Sparse Matrix LU factorization by changing/modifying the computing steps in algorithm. The problem with the KLU algorithm is it occupies more area and the throughput is less when compared with the modified KLU algorithm. The area is reduced by the 10%. The hardware complexity of implementation of sparse LU Factorization on FPGA is 15% less when compared with CPU & GPU [4] and also the computing efficiency i.e., throughput 10% to 12% on GPU&CPU do not reach theoretical computing efficiency (theoretical peak throughput).The hardware efficiency (typically 1 to 4%) of UMFPACK and SuperLU, are very less due to poor utilization of Floating point.\",\"PeriodicalId\":243805,\"journal\":{\"name\":\"2019 3rd International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech)\",\"volume\":\"194 5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEMENTech48150.2019.8981319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEMENTech48150.2019.8981319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation of an Area Efficient High Throughput Architecture for Sparse Matrix LU Factorization
In many scientific computations, Lower-upper (LU) decomposition is an important computing step, as most of the scientific applications are modeled using linear equations Ax=b. The Linear equations are used in our everyday life applications such as profit prediction in the business, income over time, mileage rate calculation. The complexity of data makes difficult to parallelize the LU decomposition. Because parallelization of LU decomposition improves the speed of solving LU factorization and reduces the delay in critical applications range from weather forecasting to power system problems-load flow computation. Field Programmable Gate Array (FPGA) is having more logic resources and parallel computing to speed up the matrix decomposition. In this work an area efficient High Throughput Architecture is designed for Sparse Matrix LU factorization by changing/modifying the computing steps in algorithm. The problem with the KLU algorithm is it occupies more area and the throughput is less when compared with the modified KLU algorithm. The area is reduced by the 10%. The hardware complexity of implementation of sparse LU Factorization on FPGA is 15% less when compared with CPU & GPU [4] and also the computing efficiency i.e., throughput 10% to 12% on GPU&CPU do not reach theoretical computing efficiency (theoretical peak throughput).The hardware efficiency (typically 1 to 4%) of UMFPACK and SuperLU, are very less due to poor utilization of Floating point.