A Dynamic Interactive Fusion Model for Extracting Fatigue Features Based on the Audiovisual Data Flow of Air Traffic Controllers

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhiyuan Shen, Xueyan Li, Junqi Bai, Kai Wang, Yifan Xu
{"title":"A Dynamic Interactive Fusion Model for Extracting Fatigue Features Based on the Audiovisual Data Flow of Air Traffic Controllers","authors":"Zhiyuan Shen,&nbsp;Xueyan Li,&nbsp;Junqi Bai,&nbsp;Kai Wang,&nbsp;Yifan Xu","doi":"10.1049/bme2/7626919","DOIUrl":null,"url":null,"abstract":"<p>Fatigue among air traffic controllers is a factor contributing to civil aviation crashes. Existing methods for extracting and fuzing fatigue features encounter two main challenges: (1) the low accuracy of traditional single-mode fatigue recognition methods, and (2) disregarding multimodal data correlations in traditional multimodal methods for feature concatenation and fusion. This paper proposes an interactive algorithm for the fusion and recognition of multimode fatigue features that combines multihead attention (MHA) and cross-attention (XATTN) which are based on an improved speech and facial fatigue recognition model. First, an improved conformer model which combines a convolutional module with a transformer encoder is proposed using the radiotelephony communication data of controllers by employing the filter bank method for extracting profound speech features. Second, facial data of controllers are processed via pointwise convolutions employing a stack of inverted residual layers, which facilitates the extraction of facial features. Third, speech and facial features are fuzed interactively by combining MHA and XATTN, which achieves high accuracy of recognizing the fatigue state of controllers working in complex operational environments. A series of experiments were conducted with audiovisual data sets collected from actual air traffic control (ATC) missions. Comparing with four competing methods for fuzing multimodal features, the experimental results indicate that the proposed method for fuzing multimode features achieved a recognition accuracy of 99.2%, which was 8.9% higher than that for a speech single-mode model and 0.4% higher than that for a facial single-mode model.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2025 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/7626919","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Biometrics","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/bme2/7626919","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Fatigue among air traffic controllers is a factor contributing to civil aviation crashes. Existing methods for extracting and fuzing fatigue features encounter two main challenges: (1) the low accuracy of traditional single-mode fatigue recognition methods, and (2) disregarding multimodal data correlations in traditional multimodal methods for feature concatenation and fusion. This paper proposes an interactive algorithm for the fusion and recognition of multimode fatigue features that combines multihead attention (MHA) and cross-attention (XATTN) which are based on an improved speech and facial fatigue recognition model. First, an improved conformer model which combines a convolutional module with a transformer encoder is proposed using the radiotelephony communication data of controllers by employing the filter bank method for extracting profound speech features. Second, facial data of controllers are processed via pointwise convolutions employing a stack of inverted residual layers, which facilitates the extraction of facial features. Third, speech and facial features are fuzed interactively by combining MHA and XATTN, which achieves high accuracy of recognizing the fatigue state of controllers working in complex operational environments. A series of experiments were conducted with audiovisual data sets collected from actual air traffic control (ATC) missions. Comparing with four competing methods for fuzing multimodal features, the experimental results indicate that the proposed method for fuzing multimode features achieved a recognition accuracy of 99.2%, which was 8.9% higher than that for a speech single-mode model and 0.4% higher than that for a facial single-mode model.

Abstract Image

Abstract Image

Abstract Image

Abstract Image

基于空中交通管制员视听数据流的疲劳特征提取动态交互融合模型
空中交通管制员的疲劳是导致民航事故的一个因素。现有的疲劳特征提取和融合方法面临两个主要挑战:(1)传统的单模态疲劳识别方法精度低;(2)传统的多模态特征拼接和融合方法忽略了多模态数据的相关性。基于改进的语音和面部疲劳识别模型,提出了一种将多头注意(MHA)和交叉注意(XATTN)相结合的多模疲劳特征融合与识别的交互式算法。首先,利用控制器的无线电话通信数据,采用滤波器组方法提取深度语音特征,提出了一种将卷积模块与变压器编码器相结合的改进的共形器模型;其次,对控制器的人脸数据进行点向卷积处理,采用一叠反向残差层,便于人脸特征的提取;第三,结合MHA和XATTN,将语音和面部特征进行交互融合,实现了复杂作战环境下控制器疲劳状态识别的高精度。利用实际空中交通管制(ATC)任务中收集的视听数据集进行了一系列实验。实验结果表明,与4种相互竞争的多模态特征融合方法相比,本文方法的识别准确率达到99.2%,比语音单模模型的识别准确率提高8.9%,比面部单模模型的识别准确率提高0.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IET Biometrics
IET Biometrics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
5.90
自引率
0.00%
发文量
46
审稿时长
33 weeks
期刊介绍: The field of biometric recognition - automated recognition of individuals based on their behavioural and biological characteristics - has now reached a level of maturity where viable practical applications are both possible and increasingly available. The biometrics field is characterised especially by its interdisciplinarity since, while focused primarily around a strong technological base, effective system design and implementation often requires a broad range of skills encompassing, for example, human factors, data security and database technologies, psychological and physiological awareness, and so on. Also, the technology focus itself embraces diversity, since the engineering of effective biometric systems requires integration of image analysis, pattern recognition, sensor technology, database engineering, security design and many other strands of understanding. The scope of the journal is intentionally relatively wide. While focusing on core technological issues, it is recognised that these may be inherently diverse and in many cases may cross traditional disciplinary boundaries. The scope of the journal will therefore include any topics where it can be shown that a paper can increase our understanding of biometric systems, signal future developments and applications for biometrics, or promote greater practical uptake for relevant technologies: Development and enhancement of individual biometric modalities including the established and traditional modalities (e.g. face, fingerprint, iris, signature and handwriting recognition) and also newer or emerging modalities (gait, ear-shape, neurological patterns, etc.) Multibiometrics, theoretical and practical issues, implementation of practical systems, multiclassifier and multimodal approaches Soft biometrics and information fusion for identification, verification and trait prediction Human factors and the human-computer interface issues for biometric systems, exception handling strategies Template construction and template management, ageing factors and their impact on biometric systems Usability and user-oriented design, psychological and physiological principles and system integration Sensors and sensor technologies for biometric processing Database technologies to support biometric systems Implementation of biometric systems, security engineering implications, smartcard and associated technologies in implementation, implementation platforms, system design and performance evaluation Trust and privacy issues, security of biometric systems and supporting technological solutions, biometric template protection Biometric cryptosystems, security and biometrics-linked encryption Links with forensic processing and cross-disciplinary commonalities Core underpinning technologies (e.g. image analysis, pattern recognition, computer vision, signal processing, etc.), where the specific relevance to biometric processing can be demonstrated Applications and application-led considerations Position papers on technology or on the industrial context of biometric system development Adoption and promotion of standards in biometrics, improving technology acceptance, deployment and interoperability, avoiding cross-cultural and cross-sector restrictions Relevant ethical and social issues
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信