VLF-DETR：集成视觉语言和高频特征的传输线缺陷检测

IF 5.9 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-07-10 DOI:10.1109/TIM.2025.3586346

Ke Zhang;Jiyuan Yang;Jiacun Wang;Zhaoye Zheng;Xin Sheng;Ningxuan Zhang

{"title":"VLF-DETR：集成视觉语言和高频特征的传输线缺陷检测","authors":"Ke Zhang;Jiyuan Yang;Jiacun Wang;Zhaoye Zheng;Xin Sheng;Ningxuan Zhang","doi":"10.1109/TIM.2025.3586346","DOIUrl":null,"url":null,"abstract":"Drone-based image capture with deep learning techniques has become a prevalent approach for inspecting transmission lines, allowing for efficient detection of defects and anomalies. However, most existing algorithms rely on the single-modality information, failing to fully exploit the textual modality inherent in labels or the unique characteristics of inspection images, such as sharply focused foregrounds and defocused backgrounds. To overcome these limitations, this article proposes vision-language and high-frequency features DEtection TRansformer (VLF-DETR), a detection model that leverages a multistage training strategy within the Deformable DETR framework. In the first stage, to address the absence of domain-specific knowledge in the general-purpose vision-language model foundational language and vision alignment (FLAVA), we fine-tune FLAVA to learn both textual and visual features relevant to the power industry. In the second stage, to better incorporate textual modality, we introduce conditional queries into Deformable DETR, effectively transferring knowledge from FLAVA into the defect detection model. In the third stage, leveraging the structural characteristics of inspection images, we apply a fast Fourier transform (FFT) to extract high-frequency edge features, suppress background noise, and provide spatial priors. Furthermore, an FFT-based loss function (FFT Loss) is introduced to further ensure the model converges on target regions. Experimental results demonstrate that VLF-DETR significantly outperforms baseline methods, offering a novel and effective solution for transmission line defect detection.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-15"},"PeriodicalIF":5.9000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VLF-DETR: Integrating Vision-Language and High-Frequency Features for Transmission Line Defect Detection\",\"authors\":\"Ke Zhang;Jiyuan Yang;Jiacun Wang;Zhaoye Zheng;Xin Sheng;Ningxuan Zhang\",\"doi\":\"10.1109/TIM.2025.3586346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Drone-based image capture with deep learning techniques has become a prevalent approach for inspecting transmission lines, allowing for efficient detection of defects and anomalies. However, most existing algorithms rely on the single-modality information, failing to fully exploit the textual modality inherent in labels or the unique characteristics of inspection images, such as sharply focused foregrounds and defocused backgrounds. To overcome these limitations, this article proposes vision-language and high-frequency features DEtection TRansformer (VLF-DETR), a detection model that leverages a multistage training strategy within the Deformable DETR framework. In the first stage, to address the absence of domain-specific knowledge in the general-purpose vision-language model foundational language and vision alignment (FLAVA), we fine-tune FLAVA to learn both textual and visual features relevant to the power industry. In the second stage, to better incorporate textual modality, we introduce conditional queries into Deformable DETR, effectively transferring knowledge from FLAVA into the defect detection model. In the third stage, leveraging the structural characteristics of inspection images, we apply a fast Fourier transform (FFT) to extract high-frequency edge features, suppress background noise, and provide spatial priors. Furthermore, an FFT-based loss function (FFT Loss) is introduced to further ensure the model converges on target regions. Experimental results demonstrate that VLF-DETR significantly outperforms baseline methods, offering a novel and effective solution for transmission line defect detection.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-15\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11076154/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11076154/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

基于无人机的图像捕获与深度学习技术已经成为一种普遍的方法来检查输电线路，允许有效地检测缺陷和异常。然而，大多数现有算法依赖于单模态信息，未能充分利用标签固有的文本模态或检测图像的独特特征，如尖锐聚焦的前景和散焦的背景。为了克服这些限制，本文提出了视觉语言和高频特征检测变压器（VLF-DETR），这是一种在可变形DETR框架内利用多阶段训练策略的检测模型。在第一阶段，为了解决通用视觉语言模型基础语言和视觉对齐（FLAVA）中缺乏领域特定知识的问题，我们对FLAVA进行了微调，以学习与电力行业相关的文本和视觉特征。在第二阶段，为了更好地整合文本模态，我们将条件查询引入到可变形DETR中，有效地将FLAVA中的知识转移到缺陷检测模型中。在第三阶段，利用检测图像的结构特征，应用快速傅里叶变换（FFT）提取高频边缘特征，抑制背景噪声，并提供空间先验。此外，引入了基于FFT的损失函数（FFT loss），进一步保证了模型在目标区域上的收敛性。实验结果表明，VLF-DETR方法明显优于基线方法，为输电线路缺陷检测提供了一种新颖有效的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VLF-DETR: Integrating Vision-Language and High-Frequency Features for Transmission Line Defect Detection

Drone-based image capture with deep learning techniques has become a prevalent approach for inspecting transmission lines, allowing for efficient detection of defects and anomalies. However, most existing algorithms rely on the single-modality information, failing to fully exploit the textual modality inherent in labels or the unique characteristics of inspection images, such as sharply focused foregrounds and defocused backgrounds. To overcome these limitations, this article proposes vision-language and high-frequency features DEtection TRansformer (VLF-DETR), a detection model that leverages a multistage training strategy within the Deformable DETR framework. In the first stage, to address the absence of domain-specific knowledge in the general-purpose vision-language model foundational language and vision alignment (FLAVA), we fine-tune FLAVA to learn both textual and visual features relevant to the power industry. In the second stage, to better incorporate textual modality, we introduce conditional queries into Deformable DETR, effectively transferring knowledge from FLAVA into the defect detection model. In the third stage, leveraging the structural characteristics of inspection images, we apply a fast Fourier transform (FFT) to extract high-frequency edge features, suppress background noise, and provide spatial priors. Furthermore, an FFT-based loss function (FFT Loss) is introduced to further ensure the model converges on target regions. Experimental results demonstrate that VLF-DETR significantly outperforms baseline methods, offering a novel and effective solution for transmission line defect detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Instrumentation and Measurement 工程技术-工程：电子与电气

CiteScore

9.00

自引率

23.20%

发文量

1294

审稿时长

3.9 months

期刊介绍： Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.