DAP-SDD: Distribution-Aware Pseudo Labeling for Small Defect Detection

Xiaoyan Zhuo, Wolfgang Rahfeldt, Xiaoqian Zhang, Ted Doros, S. Son
{"title":"DAP-SDD: Distribution-Aware Pseudo Labeling for Small Defect Detection","authors":"Xiaoyan Zhuo, Wolfgang Rahfeldt, Xiaoqian Zhang, Ted Doros, S. Son","doi":"10.3390/cmsf2022003005","DOIUrl":null,"url":null,"abstract":": Detecting defects, especially when they are small in the early manufacturing stages, is critical to achieving a high yield in industrial applications. While numerous modern deep learning models can improve detection performance, they become less effective in detecting small defects in practical applications due to the scarcity of labeled data and significant class imbalance in multiple dimensions. In this work, we propose a distribution-aware pseudo labeling method (DAP-SDD) to detect small defects accurately while using limited labeled data effectively. Specifically, we apply bootstrapping on limited labeled data and then utilize the approximated label distribution to guide pseudo label propagation. Moreover, we propose to use the t-distribution confidence interval for threshold setting to generate more pseudo labels with high confidence. DAP-SDD also incorporates data augmentation to enhance the model’s performance and robustness. We conduct extensive experiments on various datasets to validate the proposed method. Our evaluation results show that, overall, our proposed method requires less than 10% of labeled data to achieve comparable results of using a fully-labeled (100%) dataset and outperforms the state-of-the-art methods. For a dataset of wafer images, our proposed model can achieve above 0.93 of AP (average precision) with only four labeled images (i.e., 2% of labeled data).","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/cmsf2022003005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

: Detecting defects, especially when they are small in the early manufacturing stages, is critical to achieving a high yield in industrial applications. While numerous modern deep learning models can improve detection performance, they become less effective in detecting small defects in practical applications due to the scarcity of labeled data and significant class imbalance in multiple dimensions. In this work, we propose a distribution-aware pseudo labeling method (DAP-SDD) to detect small defects accurately while using limited labeled data effectively. Specifically, we apply bootstrapping on limited labeled data and then utilize the approximated label distribution to guide pseudo label propagation. Moreover, we propose to use the t-distribution confidence interval for threshold setting to generate more pseudo labels with high confidence. DAP-SDD also incorporates data augmentation to enhance the model’s performance and robustness. We conduct extensive experiments on various datasets to validate the proposed method. Our evaluation results show that, overall, our proposed method requires less than 10% of labeled data to achieve comparable results of using a fully-labeled (100%) dataset and outperforms the state-of-the-art methods. For a dataset of wafer images, our proposed model can achieve above 0.93 of AP (average precision) with only four labeled images (i.e., 2% of labeled data).
小缺陷检测的分布感知伪标记
在工业应用中,检测缺陷,特别是在早期制造阶段的小缺陷,是实现高产量的关键。虽然许多现代深度学习模型可以提高检测性能,但由于标记数据的稀缺性和多维度的显著类不平衡,它们在实际应用中检测小缺陷的效率较低。在这项工作中,我们提出了一种分布感知伪标记方法(DAP-SDD),在有效使用有限标记数据的情况下准确检测小缺陷。具体来说,我们在有限的标记数据上应用自举,然后利用近似的标签分布来指导伪标签传播。此外,我们建议使用t分布置信区间进行阈值设置,以生成更多具有高置信度的伪标签。DAP-SDD还结合了数据增强,以提高模型的性能和鲁棒性。我们在不同的数据集上进行了大量的实验来验证所提出的方法。我们的评估结果表明,总体而言,我们提出的方法只需要不到10%的标记数据就可以达到使用完全标记(100%)数据集的可比结果,并且优于最先进的方法。对于晶圆图像数据集,我们提出的模型仅使用四张标记图像(即标记数据的2%)就可以达到0.93 AP(平均精度)以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信