{"title":"提高矩阵分解效率:基于ft矩阵DSP的SVD优化研究","authors":"Anxing Xie, Yonghua Hu, Aobo Cheng, Zhuoyou Tang, P. Liu, Xin Zhang","doi":"10.1109/CSCloud-EdgeCom58631.2023.00085","DOIUrl":null,"url":null,"abstract":"Matrix decomposition is a fundamental operation in linear algebra, and it has various applications in machine learning, signal processing, edge computing, and many other fields. Singular Value Decomposition (SVD) is a matrix decomposition method that can break down a matrix into three matrices: two orthogonal matrices and a diagonal matrix. With the development of domestic high-performance Digital Signal Value Processors (DSP), the demand for matrix computation based on DSP platforms is increasing. The research of SVD implemented based on DSP is important and meaningful. However, accessing the high-performance algorithm requires developers who are familiar with the hardware characteristics, in order to combine the unique features of the algorithm with the limited hardware resources. To reduce the cost of computing the SVD in matrix, we implement a vectorization mapping method for the SVD algorithm on the FT-M7002. The single instruction multiple data (SIMD) instructions embedded in the FT-M7002 processor were utilized to exploit the data-level parallelism in the SVD algorithm. Instead of using data movement and a scalar processing unit (SPU), we compute with a single vector processing element (VPE). Additionally, DMA transfer algorithm is designed to implement matrix transposition and resolve the issue of discontinuous data access. Experimental results show that the optimized SVD algorithm improves execution performance relative to the original SVD algorithm on FT by up to 5.0 ×. Furthermore, we demonstrate that the optimized SVD algorithm on the FT-M7002 performs 1.0-2.0× faster than the optimized SVD algorithm on TMS320C6678 processor.","PeriodicalId":56007,"journal":{"name":"Journal of Cloud Computing-Advances Systems and Applications","volume":"97 1","pages":"464-469"},"PeriodicalIF":3.7000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advancing Matrix Decomposition Efficiency: A Study on FT-Matrix DSP Based SVD Optimization\",\"authors\":\"Anxing Xie, Yonghua Hu, Aobo Cheng, Zhuoyou Tang, P. Liu, Xin Zhang\",\"doi\":\"10.1109/CSCloud-EdgeCom58631.2023.00085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matrix decomposition is a fundamental operation in linear algebra, and it has various applications in machine learning, signal processing, edge computing, and many other fields. Singular Value Decomposition (SVD) is a matrix decomposition method that can break down a matrix into three matrices: two orthogonal matrices and a diagonal matrix. With the development of domestic high-performance Digital Signal Value Processors (DSP), the demand for matrix computation based on DSP platforms is increasing. The research of SVD implemented based on DSP is important and meaningful. However, accessing the high-performance algorithm requires developers who are familiar with the hardware characteristics, in order to combine the unique features of the algorithm with the limited hardware resources. To reduce the cost of computing the SVD in matrix, we implement a vectorization mapping method for the SVD algorithm on the FT-M7002. The single instruction multiple data (SIMD) instructions embedded in the FT-M7002 processor were utilized to exploit the data-level parallelism in the SVD algorithm. Instead of using data movement and a scalar processing unit (SPU), we compute with a single vector processing element (VPE). Additionally, DMA transfer algorithm is designed to implement matrix transposition and resolve the issue of discontinuous data access. Experimental results show that the optimized SVD algorithm improves execution performance relative to the original SVD algorithm on FT by up to 5.0 ×. Furthermore, we demonstrate that the optimized SVD algorithm on the FT-M7002 performs 1.0-2.0× faster than the optimized SVD algorithm on TMS320C6678 processor.\",\"PeriodicalId\":56007,\"journal\":{\"name\":\"Journal of Cloud Computing-Advances Systems and Applications\",\"volume\":\"97 1\",\"pages\":\"464-469\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cloud Computing-Advances Systems and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00085\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cloud Computing-Advances Systems and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00085","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Advancing Matrix Decomposition Efficiency: A Study on FT-Matrix DSP Based SVD Optimization
Matrix decomposition is a fundamental operation in linear algebra, and it has various applications in machine learning, signal processing, edge computing, and many other fields. Singular Value Decomposition (SVD) is a matrix decomposition method that can break down a matrix into three matrices: two orthogonal matrices and a diagonal matrix. With the development of domestic high-performance Digital Signal Value Processors (DSP), the demand for matrix computation based on DSP platforms is increasing. The research of SVD implemented based on DSP is important and meaningful. However, accessing the high-performance algorithm requires developers who are familiar with the hardware characteristics, in order to combine the unique features of the algorithm with the limited hardware resources. To reduce the cost of computing the SVD in matrix, we implement a vectorization mapping method for the SVD algorithm on the FT-M7002. The single instruction multiple data (SIMD) instructions embedded in the FT-M7002 processor were utilized to exploit the data-level parallelism in the SVD algorithm. Instead of using data movement and a scalar processing unit (SPU), we compute with a single vector processing element (VPE). Additionally, DMA transfer algorithm is designed to implement matrix transposition and resolve the issue of discontinuous data access. Experimental results show that the optimized SVD algorithm improves execution performance relative to the original SVD algorithm on FT by up to 5.0 ×. Furthermore, we demonstrate that the optimized SVD algorithm on the FT-M7002 performs 1.0-2.0× faster than the optimized SVD algorithm on TMS320C6678 processor.
期刊介绍:
The Journal of Cloud Computing: Advances, Systems and Applications (JoCCASA) will publish research articles on all aspects of Cloud Computing. Principally, articles will address topics that are core to Cloud Computing, focusing on the Cloud applications, the Cloud systems, and the advances that will lead to the Clouds of the future. Comprehensive review and survey articles that offer up new insights, and lay the foundations for further exploratory and experimental work, are also relevant.