Long–Short Observation-driven Prediction Network for pedestrian crossing intention prediction with momentary observation

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-08 DOI:10.1016/j.neucom.2024.128824

Hui Liu, Chunsheng Liu, Faliang Chang, Yansha Lu, Minhang Liu

{"title":"Long–Short Observation-driven Prediction Network for pedestrian crossing intention prediction with momentary observation","authors":"Hui Liu, Chunsheng Liu, Faliang Chang, Yansha Lu, Minhang Liu","doi":"10.1016/j.neucom.2024.128824","DOIUrl":null,"url":null,"abstract":"<div><div>Pedestrian crossing intention prediction aims to predict whether the pedestrian will cross the road, which is crucial for the decision-making of intelligent vehicles and ensuring traffic safety. Existing methods just rely on long-term observation and rarely consider it challenging to obtain sufficiently long and precise observation in real-world scenarios. Focus on momentary observation, which only contains two frames of the preceding and current time, we propose a novel <em>Long–Short Observation-driven Prediction Network</em> (LSOP-Net). LSOP-Net comprises two critical components, the <em>Momentary Observation feature Extraction Module</em> (MOE-Module) and the <em>Multimodal Long–Short-term feature Fusion Module</em> (MLSFusion). Utilizing a hybrid training strategy and an external long-term feature pool, the MOE-Module is proposed to extract features with long-term patterns from momentary observations, which effectively mitigates feature deficiency arising from momentary observations. Based on a feature selection fusion mechanism, the MLSFusion is proposed to explicitly model the importance relationship between various modalities’ long–short-term features and the output, which adaptively fuses the long–short-term features from various modalities. Experimental results on the JAAD and PIE datasets demonstrate that our approach achieves superior performance in pedestrian crossing intention prediction with momentary observation.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128824"},"PeriodicalIF":6.5000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015959","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Pedestrian crossing intention prediction aims to predict whether the pedestrian will cross the road, which is crucial for the decision-making of intelligent vehicles and ensuring traffic safety. Existing methods just rely on long-term observation and rarely consider it challenging to obtain sufficiently long and precise observation in real-world scenarios. Focus on momentary observation, which only contains two frames of the preceding and current time, we propose a novel Long–Short Observation-driven Prediction Network (LSOP-Net). LSOP-Net comprises two critical components, the Momentary Observation feature Extraction Module (MOE-Module) and the Multimodal Long–Short-term feature Fusion Module (MLSFusion). Utilizing a hybrid training strategy and an external long-term feature pool, the MOE-Module is proposed to extract features with long-term patterns from momentary observations, which effectively mitigates feature deficiency arising from momentary observations. Based on a feature selection fusion mechanism, the MLSFusion is proposed to explicitly model the importance relationship between various modalities’ long–short-term features and the output, which adaptively fuses the long–short-term features from various modalities. Experimental results on the JAAD and PIE datasets demonstrate that our approach achieves superior performance in pedestrian crossing intention prediction with momentary observation.

查看原文本刊更多论文

利用瞬时观测预测行人过街意图的长短期观测驱动预测网络

行人过马路意图预测旨在预测行人是否会过马路，这对智能车辆的决策和确保交通安全至关重要。现有方法仅仅依赖于长期观测，很少考虑在真实世界场景中获得足够长时间和精确观测的挑战性。针对仅包含前一时间和当前时间两帧的瞬间观测，我们提出了一种新型的长短期观测驱动预测网络（LSOP-Net）。LSOP-Net 由两个关键部分组成：瞬间观测特征提取模块（MOE-Module）和多模态长短期特征融合模块（MLSFusion）。MOE 模块利用混合训练策略和外部长期特征库，从瞬时观测中提取具有长期模式的特征，从而有效缓解瞬时观测带来的特征缺陷。在特征选择融合机制的基础上，提出了 MLSFusion，以明确模拟各种模态的长短期特征与输出之间的重要性关系，从而自适应地融合来自各种模态的长短期特征。在 JAAD 和 PIE 数据集上的实验结果表明，我们的方法在利用瞬间观测预测行人过街意图方面取得了优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.