Xiaosong Peng , Laurence T. Yang , Jie Li , Wenjun Jiang
{"title":"一种更快的Tucker分解异构并行计算方法","authors":"Xiaosong Peng , Laurence T. Yang , Jie Li , Wenjun Jiang","doi":"10.1016/j.engappai.2025.110725","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence (AI) technology is developing explosively in application fields such as speech recognition, semantic understanding, and computer vision. In particular, the new generation of knowledge enhancement large language models have gradually become the infrastructure of social productivity. With the development of large models towards multi-mode, the input data and model parameters characterized by tensors are becoming increasingly large. This puts high demands on computation and storage. Tucker decomposition obtains the optimal low-rank representation of the natural tensor by factor matrices and a core tensor, reducing storage and computation requirements in big data and artificial intelligence applications. However, the existing Tucker decomposition methods show limited computational speed and convergence performance. In this paper, a general heterogeneous computing framework of the Tucker decomposition method is proposed, which analyzes the row independence of factor matrices and the column independence of Kruskal matrices, and updates the Kruskal matrices instead of the core tensor in a column-wise manner and factor matrices in a row-wise manner respectively to reduce the storage overhead in computation. Further, the proposed method employs a heterogeneous computing platform to speed up the computing bottlenecks and takes full advantage of fine-grained parallel optimization technology to improve memory access efficiency. The experimental results show that the computation speed achieves a 3.1 to 75.4 times improvement compared to the latest methods. Among all the methods, it exhibits the best convergence performance.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"152 ","pages":"Article 110725"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A faster heterogeneous parallel computing method for Tucker decomposition\",\"authors\":\"Xiaosong Peng , Laurence T. Yang , Jie Li , Wenjun Jiang\",\"doi\":\"10.1016/j.engappai.2025.110725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Artificial intelligence (AI) technology is developing explosively in application fields such as speech recognition, semantic understanding, and computer vision. In particular, the new generation of knowledge enhancement large language models have gradually become the infrastructure of social productivity. With the development of large models towards multi-mode, the input data and model parameters characterized by tensors are becoming increasingly large. This puts high demands on computation and storage. Tucker decomposition obtains the optimal low-rank representation of the natural tensor by factor matrices and a core tensor, reducing storage and computation requirements in big data and artificial intelligence applications. However, the existing Tucker decomposition methods show limited computational speed and convergence performance. In this paper, a general heterogeneous computing framework of the Tucker decomposition method is proposed, which analyzes the row independence of factor matrices and the column independence of Kruskal matrices, and updates the Kruskal matrices instead of the core tensor in a column-wise manner and factor matrices in a row-wise manner respectively to reduce the storage overhead in computation. Further, the proposed method employs a heterogeneous computing platform to speed up the computing bottlenecks and takes full advantage of fine-grained parallel optimization technology to improve memory access efficiency. The experimental results show that the computation speed achieves a 3.1 to 75.4 times improvement compared to the latest methods. Among all the methods, it exhibits the best convergence performance.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"152 \",\"pages\":\"Article 110725\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625007250\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625007250","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
A faster heterogeneous parallel computing method for Tucker decomposition
Artificial intelligence (AI) technology is developing explosively in application fields such as speech recognition, semantic understanding, and computer vision. In particular, the new generation of knowledge enhancement large language models have gradually become the infrastructure of social productivity. With the development of large models towards multi-mode, the input data and model parameters characterized by tensors are becoming increasingly large. This puts high demands on computation and storage. Tucker decomposition obtains the optimal low-rank representation of the natural tensor by factor matrices and a core tensor, reducing storage and computation requirements in big data and artificial intelligence applications. However, the existing Tucker decomposition methods show limited computational speed and convergence performance. In this paper, a general heterogeneous computing framework of the Tucker decomposition method is proposed, which analyzes the row independence of factor matrices and the column independence of Kruskal matrices, and updates the Kruskal matrices instead of the core tensor in a column-wise manner and factor matrices in a row-wise manner respectively to reduce the storage overhead in computation. Further, the proposed method employs a heterogeneous computing platform to speed up the computing bottlenecks and takes full advantage of fine-grained parallel optimization technology to improve memory access efficiency. The experimental results show that the computation speed achieves a 3.1 to 75.4 times improvement compared to the latest methods. Among all the methods, it exhibits the best convergence performance.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.