Jian Ge , Qin Qin , Shaojing Song , Jinhua Jiang , Zhiwei Shen
{"title":"Unsupervised selective labeling for semi-supervised industrial defect detection","authors":"Jian Ge , Qin Qin , Shaojing Song , Jinhua Jiang , Zhiwei Shen","doi":"10.1016/j.jksuci.2024.102179","DOIUrl":null,"url":null,"abstract":"<div><p>In industrial detection scenarios, achieving high accuracy typically relies on extensive labeled datasets, which are costly and time-consuming. This has motivated a shift towards semi-supervised learning (SSL), which leverages labeled and unlabeled data to improve learning efficiency and reduce annotation costs. This work proposes the unsupervised spectral clustering labeling (USCL) method to optimize SSL for industrial challenges like defect variability, rarity, and complex distributions. Integral to USCL, we employ the multi-task fusion self-supervised learning (MTSL) method to extract robust feature representations through multiple self-supervised tasks. Additionally, we introduce the Enhanced Spectral Clustering (ESC) method and a dynamic selecting function (DSF). ESC effectively integrates both local and global similarity matrices, improving clustering accuracy. The DSF maximally selects the most valuable instances for labeling, significantly enhancing the representativeness and diversity of the labeled data. USCL consistently improves various SSL methods compared to traditional instance selection methods. For example, it boosts Efficient Teacher by 5%, 6.6%, and 7.8% in mean Average Precision(mAP) on the Automotive Sealing Rings Defect Dataset, the Metallic Surface Defect Dataset, and the Printed Circuit Boards (PCB) Defect Dataset with 10% labeled data. Our work sets a new benchmark for SSL in industrial settings.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002684/pdfft?md5=2e9ae7d3bfac3922191cefd8f900c5a6&pid=1-s2.0-S1319157824002684-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824002684","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In industrial detection scenarios, achieving high accuracy typically relies on extensive labeled datasets, which are costly and time-consuming. This has motivated a shift towards semi-supervised learning (SSL), which leverages labeled and unlabeled data to improve learning efficiency and reduce annotation costs. This work proposes the unsupervised spectral clustering labeling (USCL) method to optimize SSL for industrial challenges like defect variability, rarity, and complex distributions. Integral to USCL, we employ the multi-task fusion self-supervised learning (MTSL) method to extract robust feature representations through multiple self-supervised tasks. Additionally, we introduce the Enhanced Spectral Clustering (ESC) method and a dynamic selecting function (DSF). ESC effectively integrates both local and global similarity matrices, improving clustering accuracy. The DSF maximally selects the most valuable instances for labeling, significantly enhancing the representativeness and diversity of the labeled data. USCL consistently improves various SSL methods compared to traditional instance selection methods. For example, it boosts Efficient Teacher by 5%, 6.6%, and 7.8% in mean Average Precision(mAP) on the Automotive Sealing Rings Defect Dataset, the Metallic Surface Defect Dataset, and the Printed Circuit Boards (PCB) Defect Dataset with 10% labeled data. Our work sets a new benchmark for SSL in industrial settings.
期刊介绍:
In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.