Hierarchical Event-RGB Interaction Network for single-eye expression recognition

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2024-10-10 DOI:10.1016/j.ins.2024.121539

Runduo Han , Xiuping Liu , Yi Zhang , Jun Zhou , Hongchen Tan , Xin Li

{"title":"Hierarchical Event-RGB Interaction Network for single-eye expression recognition","authors":"Runduo Han , Xiuping Liu , Yi Zhang , Jun Zhou , Hongchen Tan , Xin Li","doi":"10.1016/j.ins.2024.121539","DOIUrl":null,"url":null,"abstract":"<div><div>The Single-eye Expression Recognition task stands as a crucial vision task, aimed at decoding human emotional states through careful examination of the eye region. Nevertheless, traditional cameras face challenges in detecting and capturing relevant biological information, especially under demanding lighting conditions such as dim environments, high exposure scenarios, or when other radiation sources are present. In this regard, we use a new type of sensor data that can resist extreme lighting conditions, namely event camera data, to improve the performance of single-eye expression recognition. To this end, we propose a novel Hierarchical Event-RGB Interaction Network (HI-Net), to fully integrate RGB and event data to overcome the extreme lighting challenges faced by the single-eye expression recognition task. The HI-Net contains two novel designs: Event-RGB Semantic Interaction Mechanism (ER-SIM) and Hierarchical Semantics Modeling (HSM) Scheme. The former aims to achieve interaction between Event and RGB modality semantics, while the latter aims to obtain high-quality modality semantic representations. In the ER-SIM, we employ an effective cross-attention mechanism to facilitate information fusion, to adaptively integrate and complement multi-scale Event and RGB semantics to cope with extreme lighting conditions. In HSM Scheme, we first explore multi-scale contextual semantics for the event modality and the RGB modality respectively. Then, we perform a semantics interaction strategy for these multi-scale contextual semantics, to enhance each modality's semantic representation. Extensive experiments demonstrate that our HI-Net significantly outperforms many state-of-the-art methods on the single-eye expression recognition task, especially under degraded lighting conditions.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121539"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014531","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The Single-eye Expression Recognition task stands as a crucial vision task, aimed at decoding human emotional states through careful examination of the eye region. Nevertheless, traditional cameras face challenges in detecting and capturing relevant biological information, especially under demanding lighting conditions such as dim environments, high exposure scenarios, or when other radiation sources are present. In this regard, we use a new type of sensor data that can resist extreme lighting conditions, namely event camera data, to improve the performance of single-eye expression recognition. To this end, we propose a novel Hierarchical Event-RGB Interaction Network (HI-Net), to fully integrate RGB and event data to overcome the extreme lighting challenges faced by the single-eye expression recognition task. The HI-Net contains two novel designs: Event-RGB Semantic Interaction Mechanism (ER-SIM) and Hierarchical Semantics Modeling (HSM) Scheme. The former aims to achieve interaction between Event and RGB modality semantics, while the latter aims to obtain high-quality modality semantic representations. In the ER-SIM, we employ an effective cross-attention mechanism to facilitate information fusion, to adaptively integrate and complement multi-scale Event and RGB semantics to cope with extreme lighting conditions. In HSM Scheme, we first explore multi-scale contextual semantics for the event modality and the RGB modality respectively. Then, we perform a semantics interaction strategy for these multi-scale contextual semantics, to enhance each modality's semantic representation. Extensive experiments demonstrate that our HI-Net significantly outperforms many state-of-the-art methods on the single-eye expression recognition task, especially under degraded lighting conditions.

查看原文本刊更多论文

用于单眼表情识别的分层事件-RGB 交互网络

单眼表情识别任务是一项重要的视觉任务，旨在通过仔细观察眼睛区域来解码人类的情绪状态。然而，传统相机在检测和捕捉相关生物信息方面面临挑战，尤其是在苛刻的照明条件下，如昏暗环境、高曝光场景或存在其他辐射源时。为此，我们使用一种能抵御极端照明条件的新型传感器数据，即事件相机数据，来提高单眼表情识别的性能。为此，我们提出了一种新颖的分层事件-RGB 交互网络（HI-Net），以充分整合 RGB 和事件数据，克服单眼表情识别任务所面临的极端照明挑战。HI-Net 包含两个新颖的设计：事件-RGB 语义交互机制（ER-SIM）和层次语义建模（HSM）方案。前者旨在实现事件与 RGB 模式语义之间的交互，后者旨在获得高质量的模式语义表示。在 ER-SIM 中，我们采用了一种有效的交叉关注机制来促进信息融合，从而自适应地整合和补充多尺度事件和 RGB 语义，以应对极端照明条件。在 HSM 方案中，我们首先分别探索了事件模式和 RGB 模式的多尺度语境语义。然后，我们针对这些多尺度语境语义执行语义交互策略，以增强每种模式的语义表示。广泛的实验证明，在单眼表情识别任务上，我们的 HI-Net 明显优于许多最先进的方法，尤其是在光照条件较差的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.