CPDD：基于细粒度特征自动编码和伪盒对比学习的跨场景光伏缺陷检测方法

IF 2.3 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Semiconductor Manufacturing Pub Date : 2025-03-15 DOI:10.1109/TSM.2025.3570323

Zhaoyang Wang;Haiyong Chen;Zhen Cao

{"title":"CPDD：基于细粒度特征自动编码和伪盒对比学习的跨场景光伏缺陷检测方法","authors":"Zhaoyang Wang;Haiyong Chen;Zhen Cao","doi":"10.1109/TSM.2025.3570323","DOIUrl":null,"url":null,"abstract":"The vision foundation model, relying on large-scale pre-training, has advanced image comprehension capabilities and excels in general scenarios. However, its performance remains suboptimal in specialized tasks, such as photovoltaic (PV) cells defect detection. This limitation stems from the models’ lack of domain-specific prior knowledge. To address this, we propose a two-stage pre-training framework comprising fine-grained feature autoencoding (FFA) and pseudo-box contrastive learning (PCL), which leverages extensive unlabeled raw images to inject domain expertise into the model. First, we investigate the fine-grained feature autoencoder, which trains a detail-sensitive vision transformer (ViT) backbone by reconstructing the histogram of oriented gradients (HOG) of masked images. Second, we pre-train the detection head through contrastive learning. Using selective search (SS) to generate pseudo-boxes, we treat paired boxes from two augmented views of an image as positive samples. The abundant unsupervised pseudo-boxes optimize the detection head’s local representation and localization capabilities. Finally, we fully fine-tune the model with labeled images. Based on this methodology, we build the cross-scenario photovoltaic defect detector (CPDD). The experimental results demonstrate that CPDD achieves state-of-the-art (SOTA) mAP50 scores on three benchmarks, outperforming detectors pre-trained on the COCO dataset as well as those specifically designed for PV defect detection.","PeriodicalId":451,"journal":{"name":"IEEE Transactions on Semiconductor Manufacturing","volume":"38 3","pages":"612-623"},"PeriodicalIF":2.3000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CPDD: A Cross-Scenario Photovoltaic Defect Detector Based on Fine-Grained Feature Autoencoding and Pseudo-Box Contrastive Learning\",\"authors\":\"Zhaoyang Wang;Haiyong Chen;Zhen Cao\",\"doi\":\"10.1109/TSM.2025.3570323\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The vision foundation model, relying on large-scale pre-training, has advanced image comprehension capabilities and excels in general scenarios. However, its performance remains suboptimal in specialized tasks, such as photovoltaic (PV) cells defect detection. This limitation stems from the models’ lack of domain-specific prior knowledge. To address this, we propose a two-stage pre-training framework comprising fine-grained feature autoencoding (FFA) and pseudo-box contrastive learning (PCL), which leverages extensive unlabeled raw images to inject domain expertise into the model. First, we investigate the fine-grained feature autoencoder, which trains a detail-sensitive vision transformer (ViT) backbone by reconstructing the histogram of oriented gradients (HOG) of masked images. Second, we pre-train the detection head through contrastive learning. Using selective search (SS) to generate pseudo-boxes, we treat paired boxes from two augmented views of an image as positive samples. The abundant unsupervised pseudo-boxes optimize the detection head’s local representation and localization capabilities. Finally, we fully fine-tune the model with labeled images. Based on this methodology, we build the cross-scenario photovoltaic defect detector (CPDD). The experimental results demonstrate that CPDD achieves state-of-the-art (SOTA) mAP50 scores on three benchmarks, outperforming detectors pre-trained on the COCO dataset as well as those specifically designed for PV defect detection.\",\"PeriodicalId\":451,\"journal\":{\"name\":\"IEEE Transactions on Semiconductor Manufacturing\",\"volume\":\"38 3\",\"pages\":\"612-623\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Semiconductor Manufacturing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11005435/\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Semiconductor Manufacturing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11005435/","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

视觉基础模型依靠大规模预训练，具有先进的图像理解能力，在一般场景中表现优异。然而，在特定的任务中，例如光伏（PV）电池缺陷检测，其性能仍然不是最理想的。这种限制源于模型缺乏特定领域的先验知识。为了解决这个问题，我们提出了一个两阶段的预训练框架，包括细粒度特征自动编码（FFA）和伪盒对比学习（PCL），它利用大量未标记的原始图像将领域专业知识注入模型。首先，我们研究了细粒度特征自编码器，它通过重建被遮挡图像的定向梯度直方图（HOG）来训练细节敏感视觉变压器（ViT）骨干。其次，通过对比学习对检测头进行预训练。使用选择性搜索（SS）来生成伪框，我们将来自图像的两个增强视图的配对框视为正样本。大量的无监督伪盒优化了检测头的局部表示和定位能力。最后，我们使用标记图像对模型进行全面微调。基于该方法，我们构建了跨场景光伏缺陷检测器（CPDD）。实验结果表明，CPDD在三个基准测试中达到了最先进的（SOTA） mAP50分数，优于在COCO数据集上预训练的检测器以及专门为PV缺陷检测设计的检测器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CPDD: A Cross-Scenario Photovoltaic Defect Detector Based on Fine-Grained Feature Autoencoding and Pseudo-Box Contrastive Learning

The vision foundation model, relying on large-scale pre-training, has advanced image comprehension capabilities and excels in general scenarios. However, its performance remains suboptimal in specialized tasks, such as photovoltaic (PV) cells defect detection. This limitation stems from the models’ lack of domain-specific prior knowledge. To address this, we propose a two-stage pre-training framework comprising fine-grained feature autoencoding (FFA) and pseudo-box contrastive learning (PCL), which leverages extensive unlabeled raw images to inject domain expertise into the model. First, we investigate the fine-grained feature autoencoder, which trains a detail-sensitive vision transformer (ViT) backbone by reconstructing the histogram of oriented gradients (HOG) of masked images. Second, we pre-train the detection head through contrastive learning. Using selective search (SS) to generate pseudo-boxes, we treat paired boxes from two augmented views of an image as positive samples. The abundant unsupervised pseudo-boxes optimize the detection head’s local representation and localization capabilities. Finally, we fully fine-tune the model with labeled images. Based on this methodology, we build the cross-scenario photovoltaic defect detector (CPDD). The experimental results demonstrate that CPDD achieves state-of-the-art (SOTA) mAP50 scores on three benchmarks, outperforming detectors pre-trained on the COCO dataset as well as those specifically designed for PV defect detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Semiconductor Manufacturing 工程技术-工程：电子与电气

CiteScore

5.20

自引率

11.10%

发文量

101

审稿时长

3.3 months

期刊介绍： The IEEE Transactions on Semiconductor Manufacturing addresses the challenging problems of manufacturing complex microelectronic components, especially very large scale integrated circuits (VLSI). Manufacturing these products requires precision micropatterning, precise control of materials properties, ultraclean work environments, and complex interactions of chemical, physical, electrical and mechanical processes.