{"title":"Compact convolutional transformers- generative adversarial network for compound fault diagnosis of industrial robot","authors":"","doi":"10.1016/j.engappai.2024.109315","DOIUrl":"10.1016/j.engappai.2024.109315","url":null,"abstract":"<div><p>The safe operation of Industrial robots is a major concern in intelligent manufacturing. Accurate compound fault diagnosis is essential to the safe operation of industrial robots, while it is challenging to achieve since the compound fault samples are hard to be collected. Generative adversarial network (GAN) is a useful tool for addressing the data imbalance issue. However, the computation efficiency of GAN in addressing the data imbalance issue has not been investigated. Hence, this study proposes a lightweight GAN named compact convolutional Transformers-GAN (CCT-GAN) to alleviate the data imbalance issue in compound fault diagnosis modelling. Firstly, the feedback current signals collected from the industrial robot are transformed into time-frequency images via continuous wavelet transformation (CWT). Secondly, CCT-GAN is designed to achieve high-quality fake data generation and compound fault diagnosis modelling without large computational costs. Thirdly, the relation between a single fault and the compound fault is considered in the compound fault diagnosis modelling via multi-hot representation to alleviate the data imbalance issue. An experimental study based on the real-world compound fault dataset of industrial robots reveals that the proposed CCT-GAN shows merits in compound fault diagnosis modelling in comparison with the prevailing algorithms. The results indicate that CCT-GAN can performance of compound fault diagnosis when only 100 data samples from each compound fault category are available.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image captioning by diffusion models: A survey","authors":"","doi":"10.1016/j.engappai.2024.109288","DOIUrl":"10.1016/j.engappai.2024.109288","url":null,"abstract":"<div><p>Diffusion models are increasingly favored over traditional approaches like generative adversarial networks (GANs) and auto-regressive transformers due to their remarkable generative capabilities. They demonstrate outstanding performance not solely limited to image generation and manipulation but also in text-related tasks. Despite this, existing surveys tend to concentrate on the utilization of diffusion models solely for image generation, ignoring their potential in image captioning. To address this oversight, our paper provides an exhaustive examination of image-to-text diffusion models within the landscape of artificial intelligence (AI) and generative computing, filling a critical void in the literature. Starting with an overview of basic diffusion model principles, we explore into the enhancements brought by conditioning or guidance and the implemented AI. We then present a taxonomy and review of cutting-edge methods in diffusion-based image captioning. Additionally, we explore applications beyond image-to-text generation, such as image-guided creative generation, text editing, and the application of AI. We also cover existing evaluation metrics, software and libraries, as well as challenges and future directions in the field.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient identity-preserving and fast-converging hybrid generative adversarial network inversion framework","authors":"","doi":"10.1016/j.engappai.2024.109287","DOIUrl":"10.1016/j.engappai.2024.109287","url":null,"abstract":"<div><p>In this paper, we present a novel Hybrid Generative Adversarial Network (HGAN) inversion framework that enables facial images to be rapidly inverted while preserving identity and personality characteristics. Accurate inversion of facial images requires high precision in computer vision and is critical to the success of future facial manipulations (age progression, regression, accessory, and hair stylization). However, existing methods often fail to preserve the personality characteristics of the real image, negatively affecting the accuracy of manipulations. In this context, our key contribution lies in using a transformer-based strategy to initiate the generator, which effectively models spatial relationships for detailed image processing. This approach is innovative because it leverages transformer structures to enhance image inversion tasks. Additionally, we introduce a novel loss function to enhance convergence speed and reliability, ensuring high accuracy in identity and personality trait preservation. Experimental results show that our method achieves a reconstruction accuracy of 93% and improves inversion time by 86%. This advancement could significantly impact facial manipulation technologies, laying the foundation for a technological breakthrough with potential applications in secure digital authentication systems and personal data protection. Our method may have a significant impact on privacy and security in future studies, contributing to the development of secure digital authentication systems and enhancing the protection of personal data. Therefore, our work is crucial for advancing the field of facial image manipulation and ensuring the privacy and security of personal data.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring explainable ensemble machine learning methods for long-term performance prediction of industrial gas turbines: A comparative analysis","authors":"","doi":"10.1016/j.engappai.2024.109318","DOIUrl":"10.1016/j.engappai.2024.109318","url":null,"abstract":"<div><p>In today's modern life, where electricity demand is one of the fundamental necessities, gas turbines play a pivotal role in meeting this demand. As such, it is imperative to address the challenges faced in the field. Current models often rely on simplifying assumptions, neglecting the intricate relationships between variables. This limitation leads to reduced accuracy and reliability, ultimately affecting the overall efficiency of gas turbine systems. Furthermore, the complexity of gas turbine behavior, coupled with the scarcity of comprehensive datasets, exacerbates the problem.</p><p>To address these challenges, this research aimed to develop an advanced model capable of accurately forecasting real gas turbine behavior. The proposed approach leveraged ensemble decision trees, robust preprocessing techniques, and rigorous evaluation using an extensive dataset spanning from 2011 to 2015. The training and validation phases were conducted on data from 2011 to 2014, with the 2015 dataset reserved for evaluation.</p><p>The results demonstrated that the bagging structure outperformed the boosted structure, exhibiting lower complexity and higher reliability. Remarkably, the bagging approach with only 30 estimators achieved a superior root mean square error of 1.4176, outperforming the boosted trees with 200 learners. The model effectively captured the overall gas turbine performance, though it encountered limitations in certain specific operating ranges.</p><p>To further investigate the model's behavior, an evaluation was conducted to assess the effects of the input variables on the output power. While the interpretability of the results posed some challenges, the overall findings were deemed acceptable and provide valuable insights for optimizing gas turbine performance. The significance of this research lies in its potential to inform decision-making and enhance the efficiency of gas turbine systems, ultimately contributing to the reliable and sustainable supply of electricity.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unmasking colorectal cancer: A high-performance semantic network for polyp and surgical instrument segmentation","authors":"","doi":"10.1016/j.engappai.2024.109292","DOIUrl":"10.1016/j.engappai.2024.109292","url":null,"abstract":"<div><p>Colorectal cancer (CRC) remains a significant health concern, with colonoscopy serving as the gold standard for diagnosis. Accurately segmenting polyps from colonoscopy images is crucial for detecting polyps and preventing CRC. However, challenges such as varying polyp sizes, blurred edges, and uneven brightness hinder segmentation accuracy. Leveraging artificial intelligence (AI) and robot-assisted surgery mechanisms can aid surgeons and physicians in detecting and treating polyps. To address these challenges, we propose a Colorectal Network (CR-Net), an AI-based encoder-decoder network for precise polyp and surgical instrument segmentation. CR-Net incorporates a pre-trained Visual Geometry Group model with 16 convolution layers (VGG16), attention mechanisms, redesigned skip connections, and horizontal dense connections within a U-Net architecture. The VGG16 encoder captures robust visual features, while redesigned skip connections accommodate complex data dimensions, leading to enhanced segmentation outcomes. Horizontal dense connections transfer overlooked features from the encoder to subsequent layers, further improving segmentation accuracy. Additionally, a spatial attention block enhances spatial features and ensures compatibility during upsampling. Evaluation of datasets including the Kvasir segmentation (Kvasir-SEG) dataset, Computer Vision Center Clinic Database (CVC-ClinicDB), Kvasir-Instrument dataset, and University of Washington Sinus Surgery Live (UW-Sinus-Surgery-Live) dataset demonstrates CR-Net's superior performance, achieving Dice Similarity Coefficients of 96.21%, 96.54%, 96.32%, and 92.84%, respectively, surpassing previous methods. These results highlight CR-Net's potential in empowering healthcare professionals through advanced AI-driven engineering applications. By bridging AI techniques with engineering innovations, CR-Net represents a significant advancement in CRC diagnosis and treatment.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sustainable management of polyethylene terephthalate waste flow using a fuzzy frank weighted assessment model","authors":"","doi":"10.1016/j.engappai.2024.109254","DOIUrl":"10.1016/j.engappai.2024.109254","url":null,"abstract":"<div><p>The consequences for the ecosystem of polyethylene terephthalate (PET) waste are becoming increasingly significant and widespread. Companies managing PET waste strive to enhance sustainability in all areas. The development of systematic decision-making approaches and frameworks for PET waste management is strongly needed. This research aims to present a new methodological framework for the categorization of the most efficient PET waste management solutions. The introduced fuzzy Frank weighted sum product assessment (FWESPA) model enables rational and flexible reasoning by nonlinearly processing uncertain information. A nonlinear aggregation function is proposed for the fusion of fuzzy strategic options. It is advantageous in simulating the impact of strategic options on a final decision. An integral part of the introduced fuzzy FWESPA model is a reverse sorting algorithm. This innovative algorithm can improve the performance of traditional normalization techniques. Also, an improved fuzzy Frank ordinal priority approach linear model is formulated to define the significance of evaluation criteria. The comprehensive real-life study demonstrates the proposed decision-analytics-based approach. The results showed the following rankings of considered alternatives: “recycling” (ℤ<sub><em>A</em>2</sub> = 0.8565) > “energy recovery” (ℤ<sub><em>A</em>1</sub> = 0.7364) > “remanufacturing” (ℤ<sub><em>A</em>4</sub> = 0.690) > “incineration” (ℤ<sub><em>A</em>3</sub> = 0.6592). Based on the results presented, alternatives “recycling” (<em>A</em><sub>2</sub>) and “energy recovery” (<em>A</em><sub>1</sub>) represent dominant alternatives with a slight advantage of recycling. Research findings can be used when deciding the appropriate way to enhance PET waste handling. The findings also describe the benefits and limitations of each treatment option for PET waste, as well as highlight the crucial challenges.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing prognostics for sparse labeled data using advanced contrastive self-supervised learning with downstream integration","authors":"","doi":"10.1016/j.engappai.2024.109268","DOIUrl":"10.1016/j.engappai.2024.109268","url":null,"abstract":"<div><p>Data-driven Prognostics and Health Management (PHM) requires extensive and well-annotated datasets for developing algorithms that can estimate and predict the health state of systems. However, acquiring run-to-failure data is costly, time-consuming, and often lacks comprehensive sampling of failure states, limiting the effectiveness of PHM models. This paper explores the use of Self-Supervised Learning (SSL) in PHM, addressing key limitations and proposing a novel contrastive SSL approach using a nested siamese network structure to enhance degradation feature representation. The model’s performance with sparse data improves by integrating downstream task information, particularly Remaining Useful Life (RUL) prediction, into the siamese structure during SSL pre-training. This approach enforces a consistency condition that failure times for two samples from the same monitoring sequence be identical. The proposed method demonstrates superior performance on the PRONOSTIA bearing dataset, outperforming state-of-the-art methods even with sparse labeling. Furthermore, the study delves into the impact of the upstream–downstream relationship in learning processes, asserting that fine-tuning significantly enhances RUL prediction by leveraging the foundational behaviors established during pre-training. Fine-tuning refines the model’s ability to capture subtle degradation patterns by building on the initial feature representations learned in pre-training, thereby improving accuracy and robustness in RUL predictions. The generalizability of the proposed strategy is confirmed through an end-to-end tool wear prediction in a real industrial environment, illustrating the applicability of the proposed method across various datasets and models, and providing effective solutions for sparse data scenarios in prognostics.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dictionary domain adaptation transformer for cross-machine fault diagnosis of rolling bearings","authors":"","doi":"10.1016/j.engappai.2024.109261","DOIUrl":"10.1016/j.engappai.2024.109261","url":null,"abstract":"<div><p>Domain adaptation (DA) techniques have significantly promoted the fault diagnosis of rolling bearings by leveraging diagnostic knowledge from a labeled source domain to recognize faults in an unlabeled target domain. However, dominant DA models often suffer from inaccurate estimation of distribution discrepancies. This stems from the fact that they perform domain alignment on a batch-by-batch basis, where the distribution discrepancies are evaluated solely using mini-batch data. In this paper, a novel dictionary domain adaptation transformer (DDAT) is proposed to boost cross-machine fault diagnosis of rolling bearings. First, a feature dictionary is constructed to represent domain attributes using multi-batch data, enabling more accurate estimation of the domain gap compared to existing batch-based methods. Second, a novel dictionary adaptation framework is designed to direct the model focus on inter-domain discrepancy instead of intra-domain variations caused by random sampling in data batches. Third, a domain-shared transformer feature extractor is developed to learn domain-invariant representations by leveraging the inherent advantages of multi-head attention in capturing long-range dependencies. The proposed DDAT method conducts domain adaptation at the dictionary level, benefiting from a more accurate estimation of distribution discrepancies by leveraging the abundant and diverse data in the dictionary. Experiments confirm that the proposed DDAT method outperforms the popular deep domain adaptation models in various cross-machine diagnosis tasks of rolling bearings.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BFFN: A novel balanced feature fusion network for fair facial expression recognition","authors":"","doi":"10.1016/j.engappai.2024.109277","DOIUrl":"10.1016/j.engappai.2024.109277","url":null,"abstract":"<div><p>Facial expression recognition (FER) technology has become increasingly mature and applicable in recent years. However, it still suffers from the bias of expression class, which can lead to unfair decisions for certain expression classes in applications. This study aims to mitigate expression class bias through both pre-processing and in-processing approaches. First, we analyze the output of existing models and demonstrate the existence of obvious class bias, particularly for underrepresented expressions. Second, we develop a class-balanced dataset constructed through data generation, mitigating unfairness at the data level. Then, we propose the Balanced Feature Fusion Network (BFFN), a class fairness-enhancing network. The BFFN mitigates the class bias by adding facial action units (AU) to enrich expression-related features and allocating weights in the AU feature fusion process to improve the extraction ability of underrepresented expression features. Finally, extensive experiments on datasets (RAF-DB and AffectNet) provide evidence that our BFFN outperforms existing FER models, improving the fairness by at least 16%.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient detector for detecting surface defects on cold-rolled steel strips","authors":"","doi":"10.1016/j.engappai.2024.109325","DOIUrl":"10.1016/j.engappai.2024.109325","url":null,"abstract":"<div><p>Surface-defect inspection is vital in cold-rolled steel-strip manufacturing, given the complexities of production environments and the high speeds involved. Further, the defects on cold-rolled steel strips are often characterized by their small size, diversity of types, and similarities among different types, posing significant challenges in balancing detection accuracy and efficiency. To address the challenges, we designed a detector based on You Only Look Once version 5 (YOLOv5) to achieve precise detection of surface defects on cold-rolled steel strips. First, a dataset containing seven types of defects was curated, named the Cold-Rolled Steel Defect Dataset (CR7-DET). Next, a feature-extraction network based on residual-like connections within a single residual block (Res2net) was developed to enhance the model’s feature-extraction capability, alongside introducing a multi-head attention module to focus on key information features. To reduce the information loss during feature fusion, we established an adaptive feature-fusion Path Aggregation Network (aff-PAN), which was optimized by designing a lightweight adaptive down-sampling module (LAD) to increase the sensory-field implementation of feature fusion. The ghost convolution effectively reduced the number of parameters and increased the speed without affecting the model’s performance. Finally, experiments were conducted on our CR7-DET and a public dataset (GC10-DET). With a reduced parameter count of 6.85 million, our model achieved a mean average precision(mAP) of 87.6% on CR7-DET and 79.7% on GC10-DET. The experimental results demonstrated that our model achieved a balance between detection accuracy and inference efficiency. The model has the potential to reduce scrap rates caused by defects and improve the overall surface quality of cold-rolled steel strips.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}