A novel and scalable multimodal large language model architecture Tool-MMGPT for future tool wear prediction in titanium alloy high-speed milling processes

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers in Industry Pub Date : 2025-04-30 DOI:10.1016/j.compind.2025.104302

Caihua Hao , Zhaoyu Wang , Xinyong Mao , Songping He , Bin Li , Hongqi Liu , Fangyu Peng , Weiye Li

{"title":"A novel and scalable multimodal large language model architecture Tool-MMGPT for future tool wear prediction in titanium alloy high-speed milling processes","authors":"Caihua Hao , Zhaoyu Wang , Xinyong Mao , Songping He , Bin Li , Hongqi Liu , Fangyu Peng , Weiye Li","doi":"10.1016/j.compind.2025.104302","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately predicting the future wear of cutting tools with variable geometric parameters remains a significant challenge. Existing methods lack the capability to model long-term temporal dependencies and predict future wear values—a key characteristic of world models. To address this challenge, we introduce the Tool-Multimodal Generative Pre-trained Transformer (Tool-MMGPT), a novel and scalable multimodal large language model (MLLM) architecture specifically designed for tool wear prediction. Tool-MMGPT pioneers the first tool wear world model by uniquely unifying multimodal data, extending beyond conventional static dimensions to incorporate dynamic temporal dimensions. This approach extracts modality-specific information and achieves shared spatiotemporal feature fusion through a cross-modal Transformer. Subsequently, alignment and joint interpretation occur within a unified representation space via a multimodal-language projector, which effectively accommodates the comprehensive input characteristics required by world models. This article proposes an effective cross-modal fusion module for vibration signals and images, aiming to fully leverage the advantages of multimodal information. Crucially, Tool-MMGPT transcends the limitations of traditional Large Language Models (LLMs) through an innovative yet generalizable method. By fundamentally reconstructing the output layer and redefining training objectives, we repurpose LLMs for numerical regression tasks, thereby establishing a novel bridge that connects textual representations to continuous numerical predictions. This enables the direct and accurate long-term forecasting of future wear time series. Extensive experiments conducted on a newly developed multimodal dataset for variable geometry tools demonstrate that Tool-MMGPT significantly outperforms state-of-the-art (SOTA) baseline methods. These results highlight the model's superior long-context modeling capabilities and illustrate its potential for effective deployment in environments with limited computational resources.</div></div>","PeriodicalId":55219,"journal":{"name":"Computers in Industry","volume":"169 ","pages":"Article 104302"},"PeriodicalIF":8.2000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Industry","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166361525000673","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately predicting the future wear of cutting tools with variable geometric parameters remains a significant challenge. Existing methods lack the capability to model long-term temporal dependencies and predict future wear values—a key characteristic of world models. To address this challenge, we introduce the Tool-Multimodal Generative Pre-trained Transformer (Tool-MMGPT), a novel and scalable multimodal large language model (MLLM) architecture specifically designed for tool wear prediction. Tool-MMGPT pioneers the first tool wear world model by uniquely unifying multimodal data, extending beyond conventional static dimensions to incorporate dynamic temporal dimensions. This approach extracts modality-specific information and achieves shared spatiotemporal feature fusion through a cross-modal Transformer. Subsequently, alignment and joint interpretation occur within a unified representation space via a multimodal-language projector, which effectively accommodates the comprehensive input characteristics required by world models. This article proposes an effective cross-modal fusion module for vibration signals and images, aiming to fully leverage the advantages of multimodal information. Crucially, Tool-MMGPT transcends the limitations of traditional Large Language Models (LLMs) through an innovative yet generalizable method. By fundamentally reconstructing the output layer and redefining training objectives, we repurpose LLMs for numerical regression tasks, thereby establishing a novel bridge that connects textual representations to continuous numerical predictions. This enables the direct and accurate long-term forecasting of future wear time series. Extensive experiments conducted on a newly developed multimodal dataset for variable geometry tools demonstrate that Tool-MMGPT significantly outperforms state-of-the-art (SOTA) baseline methods. These results highlight the model's superior long-context modeling capabilities and illustrate its potential for effective deployment in environments with limited computational resources.

查看原文本刊更多论文

面向钛合金高速铣削过程刀具磨损预测的新型多模态大语言模型体系结构tool - mmgpt

准确预测具有可变几何参数的刀具的未来磨损仍然是一个重大挑战。现有方法缺乏模拟长期时间依赖性和预测未来磨损值的能力——这是世界模型的一个关键特征。为了应对这一挑战，我们引入了工具-多模态生成预训练变压器（tool - mmgpt），这是一种专门为工具磨损预测设计的新颖且可扩展的多模态大语言模型（MLLM）架构。tool - mmgpt通过独特的统一多模态数据开创了第一个工具磨损世界模型，超越了传统的静态维度，纳入了动态时间维度。该方法提取特定于模态的信息，并通过跨模态转换器实现共享的时空特征融合。随后，通过多模态语言投影仪在统一的表示空间内进行对齐和联合解释，有效地适应了世界模型所需的综合输入特征。本文提出了一种有效的振动信号与图像的跨模态融合模块，旨在充分发挥多模态信息的优势。最重要的是，Tool-MMGPT通过一种创新且可推广的方法超越了传统大型语言模型（llm）的局限性。通过从根本上重构输出层和重新定义训练目标，我们将llm重新用于数值回归任务，从而建立了连接文本表示和连续数值预测的新桥梁。这使得对未来磨损时间序列的直接和准确的长期预测成为可能。在新开发的可变几何工具多模态数据集上进行的大量实验表明，Tool-MMGPT明显优于最先进的（SOTA）基线方法。这些结果突出了该模型优越的长上下文建模能力，并说明了它在计算资源有限的环境中有效部署的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in Industry 工程技术-计算机：跨学科应用

CiteScore

18.90

自引率

8.00%

发文量

152

审稿时长

22 days

期刊介绍： The objective of Computers in Industry is to present original, high-quality, application-oriented research papers that: • Illuminate emerging trends and possibilities in the utilization of Information and Communication Technology in industry; • Establish connections or integrations across various technology domains within the expansive realm of computer applications for industry; • Foster connections or integrations across diverse application areas of ICT in industry.