Wen-Jie Zhang , Hua-Ping Wan , Peng-Hua Hu , Hui-Bin Ge , Yaozhi Luo , Michael D. Todd
{"title":"Semi-supervised learning approach for construction object detection by integrating super-resolution and mean teacher network","authors":"Wen-Jie Zhang , Hua-Ping Wan , Peng-Hua Hu , Hui-Bin Ge , Yaozhi Luo , Michael D. Todd","doi":"10.1016/j.iintel.2024.100095","DOIUrl":null,"url":null,"abstract":"<div><p>Deep learning-based object detection methods are utilized for safety management at construction sites, which require large-scale, high-quality, and well-labeled datasets for training. The existing construction datasets are relatively small due to the high expense of labor-intensive annotation, and the varying quality of the construction images also affects the detection performance of the model. To address the limitations of datasets, this study proposes a new method for construction object detection by integrating super-resolution and semi-supervised learning. The proposed method improves the quality of construction images and achieves excellent detection performance with limited labeled data. First, the Real-ESRGAN model is introduced to improve the quality of construction images and make the construction objects visible. The proposed super-resolution method can enhance the texture details of low-resolution images, hence improving the performance of object detection models. Second, the mean-teacher network is adopted to expand the training set, thus avoiding the labor-intensive annotation work. To verify the effectiveness of the proposed method, the method is applied to the state-of-the-art Yolov5 object detection model, and construction images from the Site Object Detection Dataset (SODA) with different labeled data proportions (from 10% to 50% in 10% intervals with an extreme case of 5%) are used as the training set. By comparing with the existing supervised learning method, it is shown that the proposed method can achieve better detection performance. In particular, the method is more effective in enhancing detection performance when the proportion of the labeled data is smaller, which is of great practical value in real-world engineering. The experimental results show the potential of the proposed method in improving image quality and reducing the expense of developing construction datasets.</p></div>","PeriodicalId":100791,"journal":{"name":"Journal of Infrastructure Intelligence and Resilience","volume":"3 4","pages":"Article 100095"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772991524000148/pdfft?md5=a1f292ff4e6a45e5e49364629c2b74b7&pid=1-s2.0-S2772991524000148-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Infrastructure Intelligence and Resilience","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772991524000148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning-based object detection methods are utilized for safety management at construction sites, which require large-scale, high-quality, and well-labeled datasets for training. The existing construction datasets are relatively small due to the high expense of labor-intensive annotation, and the varying quality of the construction images also affects the detection performance of the model. To address the limitations of datasets, this study proposes a new method for construction object detection by integrating super-resolution and semi-supervised learning. The proposed method improves the quality of construction images and achieves excellent detection performance with limited labeled data. First, the Real-ESRGAN model is introduced to improve the quality of construction images and make the construction objects visible. The proposed super-resolution method can enhance the texture details of low-resolution images, hence improving the performance of object detection models. Second, the mean-teacher network is adopted to expand the training set, thus avoiding the labor-intensive annotation work. To verify the effectiveness of the proposed method, the method is applied to the state-of-the-art Yolov5 object detection model, and construction images from the Site Object Detection Dataset (SODA) with different labeled data proportions (from 10% to 50% in 10% intervals with an extreme case of 5%) are used as the training set. By comparing with the existing supervised learning method, it is shown that the proposed method can achieve better detection performance. In particular, the method is more effective in enhancing detection performance when the proportion of the labeled data is smaller, which is of great practical value in real-world engineering. The experimental results show the potential of the proposed method in improving image quality and reducing the expense of developing construction datasets.