Jie Tu , Mengjie Tang , Yong Han , Daren Wei , Kelvin K.L. Wong
{"title":"A novel industrial thermoelectric cooler component defect vision transformer detector based on local and global features fusion","authors":"Jie Tu , Mengjie Tang , Yong Han , Daren Wei , Kelvin K.L. Wong","doi":"10.1016/j.patrec.2025.06.022","DOIUrl":null,"url":null,"abstract":"<div><div>Thermoelectric coolers (TECs) are crucial in industries requiring precise temperature control, such as electronics, telecommunications, aerospace, and semiconductor manufacturing. During the manufacturing process of TEC components, defects including cracks, pits, and contamination frequently occur, compromising performance and service life. Traditional manual inspection methods are inefficient and error-prone, motivating the need for an automated and accurate defect detection approach. To address these challenges posed by the subtle, diverse, and randomly distributed defects on TEC components, we propose the Local Feature Enhance and Feature Fusion Network (LFEFFN), a hybrid model integrating convolutional neural networks (CNNs) and Transformer architectures to simultaneously capture local details and global contextual information. Specifically, the model enhances the traditional patch embedding module using affine transformations and overlapping convolutional layers, incorporates a Local Feature Extraction Module (LFEM) based on depthwise separable convolutions, and employs a Global-to-Local Feature Fusion Module (GLFM) to effectively merge features. Extensive experiments were conducted on a custom TEC dataset of 4800 images representing seven defect states, employing stratified sampling for training, validation, and testing. Cross-domain validation was also performed using the publicly available DAGM 2007 dataset. The LFEFFN achieved a Top-1 accuracy of 94.73 % and a macro-average F1 score of 0.934, outperforming state-of-the-art CNN-based and Transformer-based models. Robustness evaluations under varied lighting (±50 %), rotation (±30°), and resolution changes (50 % and 150 %) demonstrated minimal performance degradation, confirming the model's resilience in complex industrial environments. Cross-domain testing on the DAGM 2007 dataset yielded a Top-1 accuracy of 85.62 %, highlighting the model's strong generalization ability. Ablation studies further validated the contributions of each module and parameter configuration, and deployment analysis showed an average inference time of 0.05 s per image, satisfying real-time industrial application requirements.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 257-266"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002508","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Thermoelectric coolers (TECs) are crucial in industries requiring precise temperature control, such as electronics, telecommunications, aerospace, and semiconductor manufacturing. During the manufacturing process of TEC components, defects including cracks, pits, and contamination frequently occur, compromising performance and service life. Traditional manual inspection methods are inefficient and error-prone, motivating the need for an automated and accurate defect detection approach. To address these challenges posed by the subtle, diverse, and randomly distributed defects on TEC components, we propose the Local Feature Enhance and Feature Fusion Network (LFEFFN), a hybrid model integrating convolutional neural networks (CNNs) and Transformer architectures to simultaneously capture local details and global contextual information. Specifically, the model enhances the traditional patch embedding module using affine transformations and overlapping convolutional layers, incorporates a Local Feature Extraction Module (LFEM) based on depthwise separable convolutions, and employs a Global-to-Local Feature Fusion Module (GLFM) to effectively merge features. Extensive experiments were conducted on a custom TEC dataset of 4800 images representing seven defect states, employing stratified sampling for training, validation, and testing. Cross-domain validation was also performed using the publicly available DAGM 2007 dataset. The LFEFFN achieved a Top-1 accuracy of 94.73 % and a macro-average F1 score of 0.934, outperforming state-of-the-art CNN-based and Transformer-based models. Robustness evaluations under varied lighting (±50 %), rotation (±30°), and resolution changes (50 % and 150 %) demonstrated minimal performance degradation, confirming the model's resilience in complex industrial environments. Cross-domain testing on the DAGM 2007 dataset yielded a Top-1 accuracy of 85.62 %, highlighting the model's strong generalization ability. Ablation studies further validated the contributions of each module and parameter configuration, and deployment analysis showed an average inference time of 0.05 s per image, satisfying real-time industrial application requirements.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.