一种更快的Tucker分解异构并行计算方法

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-04-11 DOI:10.1016/j.engappai.2025.110725

Xiaosong Peng , Laurence T. Yang , Jie Li , Wenjun Jiang

{"title":"一种更快的Tucker分解异构并行计算方法","authors":"Xiaosong Peng , Laurence T. Yang , Jie Li , Wenjun Jiang","doi":"10.1016/j.engappai.2025.110725","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence (AI) technology is developing explosively in application fields such as speech recognition, semantic understanding, and computer vision. In particular, the new generation of knowledge enhancement large language models have gradually become the infrastructure of social productivity. With the development of large models towards multi-mode, the input data and model parameters characterized by tensors are becoming increasingly large. This puts high demands on computation and storage. Tucker decomposition obtains the optimal low-rank representation of the natural tensor by factor matrices and a core tensor, reducing storage and computation requirements in big data and artificial intelligence applications. However, the existing Tucker decomposition methods show limited computational speed and convergence performance. In this paper, a general heterogeneous computing framework of the Tucker decomposition method is proposed, which analyzes the row independence of factor matrices and the column independence of Kruskal matrices, and updates the Kruskal matrices instead of the core tensor in a column-wise manner and factor matrices in a row-wise manner respectively to reduce the storage overhead in computation. Further, the proposed method employs a heterogeneous computing platform to speed up the computing bottlenecks and takes full advantage of fine-grained parallel optimization technology to improve memory access efficiency. The experimental results show that the computation speed achieves a 3.1 to 75.4 times improvement compared to the latest methods. Among all the methods, it exhibits the best convergence performance.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"152 ","pages":"Article 110725"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A faster heterogeneous parallel computing method for Tucker decomposition\",\"authors\":\"Xiaosong Peng , Laurence T. Yang , Jie Li , Wenjun Jiang\",\"doi\":\"10.1016/j.engappai.2025.110725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Artificial intelligence (AI) technology is developing explosively in application fields such as speech recognition, semantic understanding, and computer vision. In particular, the new generation of knowledge enhancement large language models have gradually become the infrastructure of social productivity. With the development of large models towards multi-mode, the input data and model parameters characterized by tensors are becoming increasingly large. This puts high demands on computation and storage. Tucker decomposition obtains the optimal low-rank representation of the natural tensor by factor matrices and a core tensor, reducing storage and computation requirements in big data and artificial intelligence applications. However, the existing Tucker decomposition methods show limited computational speed and convergence performance. In this paper, a general heterogeneous computing framework of the Tucker decomposition method is proposed, which analyzes the row independence of factor matrices and the column independence of Kruskal matrices, and updates the Kruskal matrices instead of the core tensor in a column-wise manner and factor matrices in a row-wise manner respectively to reduce the storage overhead in computation. Further, the proposed method employs a heterogeneous computing platform to speed up the computing bottlenecks and takes full advantage of fine-grained parallel optimization technology to improve memory access efficiency. The experimental results show that the computation speed achieves a 3.1 to 75.4 times improvement compared to the latest methods. Among all the methods, it exhibits the best convergence performance.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"152 \",\"pages\":\"Article 110725\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625007250\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625007250","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

人工智能（AI）技术在语音识别、语义理解和计算机视觉等应用领域正在爆炸式发展。特别是新一代的知识增强大语言模型逐渐成为社会生产力的基础设施。随着大模型向多模态发展，以张量为特征的输入数据和模型参数变得越来越大。这对计算和存储提出了很高的要求。Tucker分解通过因子矩阵和核心张量获得自然张量的最优低秩表示，降低了大数据和人工智能应用中的存储和计算需求。然而，现有的Tucker分解方法计算速度和收敛性能有限。本文提出了Tucker分解方法的通用异构计算框架，分析了因子矩阵的行独立性和Kruskal矩阵的列独立性，分别以列方式更新Kruskal矩阵代替核心张量，以行方式更新因子矩阵，以减少计算中的存储开销。此外，该方法采用异构计算平台来加速计算瓶颈，并充分利用细粒度并行优化技术来提高内存访问效率。实验结果表明，该方法的计算速度比最新方法提高了3.1 ~ 75.4倍。在所有方法中，该方法的收敛性能最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A faster heterogeneous parallel computing method for Tucker decomposition

Artificial intelligence (AI) technology is developing explosively in application fields such as speech recognition, semantic understanding, and computer vision. In particular, the new generation of knowledge enhancement large language models have gradually become the infrastructure of social productivity. With the development of large models towards multi-mode, the input data and model parameters characterized by tensors are becoming increasingly large. This puts high demands on computation and storage. Tucker decomposition obtains the optimal low-rank representation of the natural tensor by factor matrices and a core tensor, reducing storage and computation requirements in big data and artificial intelligence applications. However, the existing Tucker decomposition methods show limited computational speed and convergence performance. In this paper, a general heterogeneous computing framework of the Tucker decomposition method is proposed, which analyzes the row independence of factor matrices and the column independence of Kruskal matrices, and updates the Kruskal matrices instead of the core tensor in a column-wise manner and factor matrices in a row-wise manner respectively to reduce the storage overhead in computation. Further, the proposed method employs a heterogeneous computing platform to speed up the computing bottlenecks and takes full advantage of fine-grained parallel optimization technology to improve memory access efficiency. The experimental results show that the computation speed achieves a 3.1 to 75.4 times improvement compared to the latest methods. Among all the methods, it exhibits the best convergence performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.