Ke Zhang;Jiyuan Yang;Jiacun Wang;Zhaoye Zheng;Xin Sheng;Ningxuan Zhang
{"title":"VLF-DETR:集成视觉语言和高频特征的传输线缺陷检测","authors":"Ke Zhang;Jiyuan Yang;Jiacun Wang;Zhaoye Zheng;Xin Sheng;Ningxuan Zhang","doi":"10.1109/TIM.2025.3586346","DOIUrl":null,"url":null,"abstract":"Drone-based image capture with deep learning techniques has become a prevalent approach for inspecting transmission lines, allowing for efficient detection of defects and anomalies. However, most existing algorithms rely on the single-modality information, failing to fully exploit the textual modality inherent in labels or the unique characteristics of inspection images, such as sharply focused foregrounds and defocused backgrounds. To overcome these limitations, this article proposes vision-language and high-frequency features DEtection TRansformer (VLF-DETR), a detection model that leverages a multistage training strategy within the Deformable DETR framework. In the first stage, to address the absence of domain-specific knowledge in the general-purpose vision-language model foundational language and vision alignment (FLAVA), we fine-tune FLAVA to learn both textual and visual features relevant to the power industry. In the second stage, to better incorporate textual modality, we introduce conditional queries into Deformable DETR, effectively transferring knowledge from FLAVA into the defect detection model. In the third stage, leveraging the structural characteristics of inspection images, we apply a fast Fourier transform (FFT) to extract high-frequency edge features, suppress background noise, and provide spatial priors. Furthermore, an FFT-based loss function (FFT Loss) is introduced to further ensure the model converges on target regions. Experimental results demonstrate that VLF-DETR significantly outperforms baseline methods, offering a novel and effective solution for transmission line defect detection.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-15"},"PeriodicalIF":5.9000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VLF-DETR: Integrating Vision-Language and High-Frequency Features for Transmission Line Defect Detection\",\"authors\":\"Ke Zhang;Jiyuan Yang;Jiacun Wang;Zhaoye Zheng;Xin Sheng;Ningxuan Zhang\",\"doi\":\"10.1109/TIM.2025.3586346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Drone-based image capture with deep learning techniques has become a prevalent approach for inspecting transmission lines, allowing for efficient detection of defects and anomalies. However, most existing algorithms rely on the single-modality information, failing to fully exploit the textual modality inherent in labels or the unique characteristics of inspection images, such as sharply focused foregrounds and defocused backgrounds. To overcome these limitations, this article proposes vision-language and high-frequency features DEtection TRansformer (VLF-DETR), a detection model that leverages a multistage training strategy within the Deformable DETR framework. In the first stage, to address the absence of domain-specific knowledge in the general-purpose vision-language model foundational language and vision alignment (FLAVA), we fine-tune FLAVA to learn both textual and visual features relevant to the power industry. In the second stage, to better incorporate textual modality, we introduce conditional queries into Deformable DETR, effectively transferring knowledge from FLAVA into the defect detection model. In the third stage, leveraging the structural characteristics of inspection images, we apply a fast Fourier transform (FFT) to extract high-frequency edge features, suppress background noise, and provide spatial priors. Furthermore, an FFT-based loss function (FFT Loss) is introduced to further ensure the model converges on target regions. Experimental results demonstrate that VLF-DETR significantly outperforms baseline methods, offering a novel and effective solution for transmission line defect detection.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-15\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11076154/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11076154/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
VLF-DETR: Integrating Vision-Language and High-Frequency Features for Transmission Line Defect Detection
Drone-based image capture with deep learning techniques has become a prevalent approach for inspecting transmission lines, allowing for efficient detection of defects and anomalies. However, most existing algorithms rely on the single-modality information, failing to fully exploit the textual modality inherent in labels or the unique characteristics of inspection images, such as sharply focused foregrounds and defocused backgrounds. To overcome these limitations, this article proposes vision-language and high-frequency features DEtection TRansformer (VLF-DETR), a detection model that leverages a multistage training strategy within the Deformable DETR framework. In the first stage, to address the absence of domain-specific knowledge in the general-purpose vision-language model foundational language and vision alignment (FLAVA), we fine-tune FLAVA to learn both textual and visual features relevant to the power industry. In the second stage, to better incorporate textual modality, we introduce conditional queries into Deformable DETR, effectively transferring knowledge from FLAVA into the defect detection model. In the third stage, leveraging the structural characteristics of inspection images, we apply a fast Fourier transform (FFT) to extract high-frequency edge features, suppress background noise, and provide spatial priors. Furthermore, an FFT-based loss function (FFT Loss) is introduced to further ensure the model converges on target regions. Experimental results demonstrate that VLF-DETR significantly outperforms baseline methods, offering a novel and effective solution for transmission line defect detection.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.