Xiaosong Peng;Laurence T. Yang;Xiaokang Wang;Debin Liu;Jie Li
{"title":"A High-Efficiency Parallel Mechanism for Canonical Polyadic Decomposition on Heterogeneous Computing Platform","authors":"Xiaosong Peng;Laurence T. Yang;Xiaokang Wang;Debin Liu;Jie Li","doi":"10.1109/TC.2025.3587623","DOIUrl":null,"url":null,"abstract":"Canonical Polyadic decomposition (CPD) obtains the low-rank approximation for high-order multidimensional tensors through the summation of a sequence of rank-one tensors, greatly reducing storage and computation overhead. It is increasingly being used in the lightweight design of artificial intelligence and big data processing. The existing CPD technology exhibits inherent limitations in simultaneously achieving high accuracy and high efficiency. In this paper, a heterogeneous computing method for CPD is proposed to optimize computing efficiency with guaranteed convergence accuracy. Specifically, a quasi-convex decomposition loss function is constructed and the extreme points of the Kruskal matrix rows have been solved. Further, the massively parallelized operators in the algorithm are extracted, a software-hardware integrated scheduling method is designed, and the deployment of CPD on heterogeneous computing platforms is achieved. Finally, the memory access strategy is optimized to improve memory access efficiency. We tested the algorithm on real-world and synthetic sparse tensor datasets, numerical experimental results show that compared with the state-of-the-art method, the proposed method has a higher convergence accuracy and computing efficiency. Compared to the standard CPD parallel library, the method achieves efficiency improvements of tens to hundreds of times while maintaining the same accuracy.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3377-3389"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11077740/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Canonical Polyadic decomposition (CPD) obtains the low-rank approximation for high-order multidimensional tensors through the summation of a sequence of rank-one tensors, greatly reducing storage and computation overhead. It is increasingly being used in the lightweight design of artificial intelligence and big data processing. The existing CPD technology exhibits inherent limitations in simultaneously achieving high accuracy and high efficiency. In this paper, a heterogeneous computing method for CPD is proposed to optimize computing efficiency with guaranteed convergence accuracy. Specifically, a quasi-convex decomposition loss function is constructed and the extreme points of the Kruskal matrix rows have been solved. Further, the massively parallelized operators in the algorithm are extracted, a software-hardware integrated scheduling method is designed, and the deployment of CPD on heterogeneous computing platforms is achieved. Finally, the memory access strategy is optimized to improve memory access efficiency. We tested the algorithm on real-world and synthetic sparse tensor datasets, numerical experimental results show that compared with the state-of-the-art method, the proposed method has a higher convergence accuracy and computing efficiency. Compared to the standard CPD parallel library, the method achieves efficiency improvements of tens to hundreds of times while maintaining the same accuracy.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.