{"title":"AV-FDTI: Audio-visual fusion for drone threat identification","authors":"Yizhuo Yang, Shenghai Yuan, Jianfei Yang, Thien Hoang Nguyen, Muqing Cao, Thien-Minh Nguyen, Han Wang, Lihua Xie","doi":"10.1016/j.jai.2024.06.002","DOIUrl":null,"url":null,"abstract":"<div><p>In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which have the potential to transport harmful payloads or cause significant damage, we present AV-FDTI, an innovative Audio-Visual Fusion system designed for Drone Threat Identification. AV-FDTI leverages the fusion of audio and omnidirectional camera feature inputs, providing a comprehensive solution to enhance the precision and resilience of drone classification and 3D localization. Specifically, AV-FDTI employs a CRNN network to capture vital temporal dynamics within the audio domain and utilizes a pretrained ResNet50 model for image feature extraction. Furthermore, we adopt a visual information entropy and cross-attention-based mechanism to enhance the fusion of visual and audio data. Notably, our system is trained based on automated Leica tracking annotations, offering accurate ground truth data with millimeter-level accuracy. Comprehensive comparative evaluations demonstrate the superiority of our solution over the existing systems. In our commitment to advancing this field, we will release this work as open-source code and wearable AV-FDTI design, contributing valuable resources to the research community.</p></div>","PeriodicalId":100755,"journal":{"name":"Journal of Automation and Intelligence","volume":"3 3","pages":"Pages 144-151"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949855424000285/pdfft?md5=684e37e58a4ae5abf55addf8f81639b9&pid=1-s2.0-S2949855424000285-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Automation and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949855424000285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which have the potential to transport harmful payloads or cause significant damage, we present AV-FDTI, an innovative Audio-Visual Fusion system designed for Drone Threat Identification. AV-FDTI leverages the fusion of audio and omnidirectional camera feature inputs, providing a comprehensive solution to enhance the precision and resilience of drone classification and 3D localization. Specifically, AV-FDTI employs a CRNN network to capture vital temporal dynamics within the audio domain and utilizes a pretrained ResNet50 model for image feature extraction. Furthermore, we adopt a visual information entropy and cross-attention-based mechanism to enhance the fusion of visual and audio data. Notably, our system is trained based on automated Leica tracking annotations, offering accurate ground truth data with millimeter-level accuracy. Comprehensive comparative evaluations demonstrate the superiority of our solution over the existing systems. In our commitment to advancing this field, we will release this work as open-source code and wearable AV-FDTI design, contributing valuable resources to the research community.