A novel and scalable multimodal large language model architecture Tool-MMGPT for future tool wear prediction in titanium alloy high-speed milling processes

IF 8.2 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Caihua Hao , Zhaoyu Wang , Xinyong Mao , Songping He , Bin Li , Hongqi Liu , Fangyu Peng , Weiye Li
{"title":"A novel and scalable multimodal large language model architecture Tool-MMGPT for future tool wear prediction in titanium alloy high-speed milling processes","authors":"Caihua Hao ,&nbsp;Zhaoyu Wang ,&nbsp;Xinyong Mao ,&nbsp;Songping He ,&nbsp;Bin Li ,&nbsp;Hongqi Liu ,&nbsp;Fangyu Peng ,&nbsp;Weiye Li","doi":"10.1016/j.compind.2025.104302","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately predicting the future wear of cutting tools with variable geometric parameters remains a significant challenge. Existing methods lack the capability to model long-term temporal dependencies and predict future wear values—a key characteristic of world models. To address this challenge, we introduce the Tool-Multimodal Generative Pre-trained Transformer (Tool-MMGPT), a novel and scalable multimodal large language model (MLLM) architecture specifically designed for tool wear prediction. Tool-MMGPT pioneers the first tool wear world model by uniquely unifying multimodal data, extending beyond conventional static dimensions to incorporate dynamic temporal dimensions. This approach extracts modality-specific information and achieves shared spatiotemporal feature fusion through a cross-modal Transformer. Subsequently, alignment and joint interpretation occur within a unified representation space via a multimodal-language projector, which effectively accommodates the comprehensive input characteristics required by world models. This article proposes an effective cross-modal fusion module for vibration signals and images, aiming to fully leverage the advantages of multimodal information. Crucially, Tool-MMGPT transcends the limitations of traditional Large Language Models (LLMs) through an innovative yet generalizable method. By fundamentally reconstructing the output layer and redefining training objectives, we repurpose LLMs for numerical regression tasks, thereby establishing a novel bridge that connects textual representations to continuous numerical predictions. This enables the direct and accurate long-term forecasting of future wear time series. Extensive experiments conducted on a newly developed multimodal dataset for variable geometry tools demonstrate that Tool-MMGPT significantly outperforms state-of-the-art (SOTA) baseline methods. These results highlight the model's superior long-context modeling capabilities and illustrate its potential for effective deployment in environments with limited computational resources.</div></div>","PeriodicalId":55219,"journal":{"name":"Computers in Industry","volume":"169 ","pages":"Article 104302"},"PeriodicalIF":8.2000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Industry","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166361525000673","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately predicting the future wear of cutting tools with variable geometric parameters remains a significant challenge. Existing methods lack the capability to model long-term temporal dependencies and predict future wear values—a key characteristic of world models. To address this challenge, we introduce the Tool-Multimodal Generative Pre-trained Transformer (Tool-MMGPT), a novel and scalable multimodal large language model (MLLM) architecture specifically designed for tool wear prediction. Tool-MMGPT pioneers the first tool wear world model by uniquely unifying multimodal data, extending beyond conventional static dimensions to incorporate dynamic temporal dimensions. This approach extracts modality-specific information and achieves shared spatiotemporal feature fusion through a cross-modal Transformer. Subsequently, alignment and joint interpretation occur within a unified representation space via a multimodal-language projector, which effectively accommodates the comprehensive input characteristics required by world models. This article proposes an effective cross-modal fusion module for vibration signals and images, aiming to fully leverage the advantages of multimodal information. Crucially, Tool-MMGPT transcends the limitations of traditional Large Language Models (LLMs) through an innovative yet generalizable method. By fundamentally reconstructing the output layer and redefining training objectives, we repurpose LLMs for numerical regression tasks, thereby establishing a novel bridge that connects textual representations to continuous numerical predictions. This enables the direct and accurate long-term forecasting of future wear time series. Extensive experiments conducted on a newly developed multimodal dataset for variable geometry tools demonstrate that Tool-MMGPT significantly outperforms state-of-the-art (SOTA) baseline methods. These results highlight the model's superior long-context modeling capabilities and illustrate its potential for effective deployment in environments with limited computational resources.
面向钛合金高速铣削过程刀具磨损预测的新型多模态大语言模型体系结构tool - mmgpt
准确预测具有可变几何参数的刀具的未来磨损仍然是一个重大挑战。现有方法缺乏模拟长期时间依赖性和预测未来磨损值的能力——这是世界模型的一个关键特征。为了应对这一挑战,我们引入了工具-多模态生成预训练变压器(tool - mmgpt),这是一种专门为工具磨损预测设计的新颖且可扩展的多模态大语言模型(MLLM)架构。tool - mmgpt通过独特的统一多模态数据开创了第一个工具磨损世界模型,超越了传统的静态维度,纳入了动态时间维度。该方法提取特定于模态的信息,并通过跨模态转换器实现共享的时空特征融合。随后,通过多模态语言投影仪在统一的表示空间内进行对齐和联合解释,有效地适应了世界模型所需的综合输入特征。本文提出了一种有效的振动信号与图像的跨模态融合模块,旨在充分发挥多模态信息的优势。最重要的是,Tool-MMGPT通过一种创新且可推广的方法超越了传统大型语言模型(llm)的局限性。通过从根本上重构输出层和重新定义训练目标,我们将llm重新用于数值回归任务,从而建立了连接文本表示和连续数值预测的新桥梁。这使得对未来磨损时间序列的直接和准确的长期预测成为可能。在新开发的可变几何工具多模态数据集上进行的大量实验表明,Tool-MMGPT明显优于最先进的(SOTA)基线方法。这些结果突出了该模型优越的长上下文建模能力,并说明了它在计算资源有限的环境中有效部署的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in Industry
Computers in Industry 工程技术-计算机:跨学科应用
CiteScore
18.90
自引率
8.00%
发文量
152
审稿时长
22 days
期刊介绍: The objective of Computers in Industry is to present original, high-quality, application-oriented research papers that: • Illuminate emerging trends and possibilities in the utilization of Information and Communication Technology in industry; • Establish connections or integrations across various technology domains within the expansive realm of computer applications for industry; • Foster connections or integrations across diverse application areas of ICT in industry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信