A Dynamic Interactive Fusion Model for Extracting Fatigue Features Based on the Audiovisual Data Flow of Air Traffic Controllers

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Biometrics Pub Date : 2025-08-22 DOI:10.1049/bme2/7626919

Zhiyuan Shen, Xueyan Li, Junqi Bai, Kai Wang, Yifan Xu

{"title":"A Dynamic Interactive Fusion Model for Extracting Fatigue Features Based on the Audiovisual Data Flow of Air Traffic Controllers","authors":"Zhiyuan Shen, Xueyan Li, Junqi Bai, Kai Wang, Yifan Xu","doi":"10.1049/bme2/7626919","DOIUrl":null,"url":null,"abstract":"<p>Fatigue among air traffic controllers is a factor contributing to civil aviation crashes. Existing methods for extracting and fuzing fatigue features encounter two main challenges: (1) the low accuracy of traditional single-mode fatigue recognition methods, and (2) disregarding multimodal data correlations in traditional multimodal methods for feature concatenation and fusion. This paper proposes an interactive algorithm for the fusion and recognition of multimode fatigue features that combines multihead attention (MHA) and cross-attention (XATTN) which are based on an improved speech and facial fatigue recognition model. First, an improved conformer model which combines a convolutional module with a transformer encoder is proposed using the radiotelephony communication data of controllers by employing the filter bank method for extracting profound speech features. Second, facial data of controllers are processed via pointwise convolutions employing a stack of inverted residual layers, which facilitates the extraction of facial features. Third, speech and facial features are fuzed interactively by combining MHA and XATTN, which achieves high accuracy of recognizing the fatigue state of controllers working in complex operational environments. A series of experiments were conducted with audiovisual data sets collected from actual air traffic control (ATC) missions. Comparing with four competing methods for fuzing multimodal features, the experimental results indicate that the proposed method for fuzing multimode features achieved a recognition accuracy of 99.2%, which was 8.9% higher than that for a speech single-mode model and 0.4% higher than that for a facial single-mode model.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2025 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/7626919","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Biometrics","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/bme2/7626919","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Fatigue among air traffic controllers is a factor contributing to civil aviation crashes. Existing methods for extracting and fuzing fatigue features encounter two main challenges: (1) the low accuracy of traditional single-mode fatigue recognition methods, and (2) disregarding multimodal data correlations in traditional multimodal methods for feature concatenation and fusion. This paper proposes an interactive algorithm for the fusion and recognition of multimode fatigue features that combines multihead attention (MHA) and cross-attention (XATTN) which are based on an improved speech and facial fatigue recognition model. First, an improved conformer model which combines a convolutional module with a transformer encoder is proposed using the radiotelephony communication data of controllers by employing the filter bank method for extracting profound speech features. Second, facial data of controllers are processed via pointwise convolutions employing a stack of inverted residual layers, which facilitates the extraction of facial features. Third, speech and facial features are fuzed interactively by combining MHA and XATTN, which achieves high accuracy of recognizing the fatigue state of controllers working in complex operational environments. A series of experiments were conducted with audiovisual data sets collected from actual air traffic control (ATC) missions. Comparing with four competing methods for fuzing multimodal features, the experimental results indicate that the proposed method for fuzing multimode features achieved a recognition accuracy of 99.2%, which was 8.9% higher than that for a speech single-mode model and 0.4% higher than that for a facial single-mode model.

Abstract Image

查看原文本刊更多论文

基于空中交通管制员视听数据流的疲劳特征提取动态交互融合模型

空中交通管制员的疲劳是导致民航事故的一个因素。现有的疲劳特征提取和融合方法面临两个主要挑战：(1)传统的单模态疲劳识别方法精度低；(2)传统的多模态特征拼接和融合方法忽略了多模态数据的相关性。基于改进的语音和面部疲劳识别模型，提出了一种将多头注意（MHA）和交叉注意（XATTN）相结合的多模疲劳特征融合与识别的交互式算法。首先，利用控制器的无线电话通信数据，采用滤波器组方法提取深度语音特征，提出了一种将卷积模块与变压器编码器相结合的改进的共形器模型；其次，对控制器的人脸数据进行点向卷积处理，采用一叠反向残差层，便于人脸特征的提取；第三，结合MHA和XATTN，将语音和面部特征进行交互融合，实现了复杂作战环境下控制器疲劳状态识别的高精度。利用实际空中交通管制（ATC）任务中收集的视听数据集进行了一系列实验。实验结果表明，与4种相互竞争的多模态特征融合方法相比，本文方法的识别准确率达到99.2%，比语音单模模型的识别准确率提高8.9%，比面部单模模型的识别准确率提高0.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Biometrics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

5.90

自引率

0.00%

发文量

审稿时长

33 weeks

期刊介绍： The field of biometric recognition - automated recognition of individuals based on their behavioural and biological characteristics - has now reached a level of maturity where viable practical applications are both possible and increasingly available. The biometrics field is characterised especially by its interdisciplinarity since, while focused primarily around a strong technological base, effective system design and implementation often requires a broad range of skills encompassing, for example, human factors, data security and database technologies, psychological and physiological awareness, and so on. Also, the technology focus itself embraces diversity, since the engineering of effective biometric systems requires integration of image analysis, pattern recognition, sensor technology, database engineering, security design and many other strands of understanding. The scope of the journal is intentionally relatively wide. While focusing on core technological issues, it is recognised that these may be inherently diverse and in many cases may cross traditional disciplinary boundaries. The scope of the journal will therefore include any topics where it can be shown that a paper can increase our understanding of biometric systems, signal future developments and applications for biometrics, or promote greater practical uptake for relevant technologies: Development and enhancement of individual biometric modalities including the established and traditional modalities (e.g. face, fingerprint, iris, signature and handwriting recognition) and also newer or emerging modalities (gait, ear-shape, neurological patterns, etc.) Multibiometrics, theoretical and practical issues, implementation of practical systems, multiclassifier and multimodal approaches Soft biometrics and information fusion for identification, verification and trait prediction Human factors and the human-computer interface issues for biometric systems, exception handling strategies Template construction and template management, ageing factors and their impact on biometric systems Usability and user-oriented design, psychological and physiological principles and system integration Sensors and sensor technologies for biometric processing Database technologies to support biometric systems Implementation of biometric systems, security engineering implications, smartcard and associated technologies in implementation, implementation platforms, system design and performance evaluation Trust and privacy issues, security of biometric systems and supporting technological solutions, biometric template protection Biometric cryptosystems, security and biometrics-linked encryption Links with forensic processing and cross-disciplinary commonalities Core underpinning technologies (e.g. image analysis, pattern recognition, computer vision, signal processing, etc.), where the specific relevance to biometric processing can be demonstrated Applications and application-led considerations Position papers on technology or on the industrial context of biometric system development Adoption and promotion of standards in biometrics, improving technology acceptance, deployment and interoperability, avoiding cross-cultural and cross-sector restrictions Relevant ethical and social issues