用于行人危险行为检测的目标导向型环境情境建模

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Automation Science and Engineering Pub Date : 2025-03-21 DOI:10.1109/TASE.2025.3553495

Zhenyu Shi;Shibo He;Meng Zhang;Kun Shi

{"title":"用于行人危险行为检测的目标导向型环境情境建模","authors":"Zhenyu Shi;Shibo He;Meng Zhang;Kun Shi","doi":"10.1109/TASE.2025.3553495","DOIUrl":null,"url":null,"abstract":"In autonomous driving systems, detecting pedestrian risky behavior is crucial for ensuring the safety of human-vehicle interactions. The behavior of pedestrians is not only determined by their individual actions but also influenced by their interactions with the surrounding environment. Considering the impact of environmental context on pedestrian behavior, recent studies have proposed various methods for extracting contextual information. However, efficiently modeling the environmental contextual features of pedestrian behavior remains a challenge. In this paper, we propose a target-oriented environmental context modeling method that accounts for the role of target-specific features in constructing context features, achieving target-adaptive context feature extraction. Our solution, called State DEtection TRansformer (SDETR), provides an end-to-end framework for risky pedestrian behavior detection. First, we devise a Dual-level Feature Encoder that effectively decouples high-level target semantic and low-level environmental texture. Specifically, the texture encoding enables label-free environmental feature extraction. Then, we develop a Object-Environment Perception Decoder that flexibly decodes cross-feature domain contextual features using object features. Finally, a Feature Fusion Head is employed to merge object features with environmental state features for the detection output. Experiments demonstrate the outstanding performance of SDETR in two typical risky behavior detection tasks (crossing detection and intrusion detection). We report a new record of 87.3% accuracy on the JAAD dataset and 76.3% accuracy on the Cityintrusion dataset, which significantly outperforms all previously published results. Note to Practitioners—The motivation of this paper is to detect risky pedestrian behavior in traffic scenes, with a particular focus on the construction of pedestrian environmental context. Existing methods for extracting environmental context typically involve complex feature extraction or label descriptions of the background, followed by a mechanistic establishment of the interplay between pedestrians and their surroundings. This paper proposes a flexible and cost-effective method for modeling environmental context. We employ an attention mechanism to autonomously decode cross-feature domain contextual state features, using target features as query features. Moreover, we employ dual-level feature extraction methods for targets and backgrounds (target semantics and background textures), significantly reducing the labeling cost for environmental description. Preliminary experiments suggest that this method is feasible in the detection of two risky pedestrian behaviors: pedestrian crossing and pedestrian intrusion. However, it has not yet been extended to other pedestrian dangerous behavior tasks. In future research, we intend to delve deeper into the recognition of unlabeled pedestrian abnormal and risky behaviors, expanding our research beyond the current scope.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"13373-13386"},"PeriodicalIF":6.4000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Target-Oriented Environmental Context Modeling for Pedestrian Risky Behavior Detection\",\"authors\":\"Zhenyu Shi;Shibo He;Meng Zhang;Kun Shi\",\"doi\":\"10.1109/TASE.2025.3553495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In autonomous driving systems, detecting pedestrian risky behavior is crucial for ensuring the safety of human-vehicle interactions. The behavior of pedestrians is not only determined by their individual actions but also influenced by their interactions with the surrounding environment. Considering the impact of environmental context on pedestrian behavior, recent studies have proposed various methods for extracting contextual information. However, efficiently modeling the environmental contextual features of pedestrian behavior remains a challenge. In this paper, we propose a target-oriented environmental context modeling method that accounts for the role of target-specific features in constructing context features, achieving target-adaptive context feature extraction. Our solution, called State DEtection TRansformer (SDETR), provides an end-to-end framework for risky pedestrian behavior detection. First, we devise a Dual-level Feature Encoder that effectively decouples high-level target semantic and low-level environmental texture. Specifically, the texture encoding enables label-free environmental feature extraction. Then, we develop a Object-Environment Perception Decoder that flexibly decodes cross-feature domain contextual features using object features. Finally, a Feature Fusion Head is employed to merge object features with environmental state features for the detection output. Experiments demonstrate the outstanding performance of SDETR in two typical risky behavior detection tasks (crossing detection and intrusion detection). We report a new record of 87.3% accuracy on the JAAD dataset and 76.3% accuracy on the Cityintrusion dataset, which significantly outperforms all previously published results. Note to Practitioners—The motivation of this paper is to detect risky pedestrian behavior in traffic scenes, with a particular focus on the construction of pedestrian environmental context. Existing methods for extracting environmental context typically involve complex feature extraction or label descriptions of the background, followed by a mechanistic establishment of the interplay between pedestrians and their surroundings. This paper proposes a flexible and cost-effective method for modeling environmental context. We employ an attention mechanism to autonomously decode cross-feature domain contextual state features, using target features as query features. Moreover, we employ dual-level feature extraction methods for targets and backgrounds (target semantics and background textures), significantly reducing the labeling cost for environmental description. Preliminary experiments suggest that this method is feasible in the detection of two risky pedestrian behaviors: pedestrian crossing and pedestrian intrusion. However, it has not yet been extended to other pedestrian dangerous behavior tasks. In future research, we intend to delve deeper into the recognition of unlabeled pedestrian abnormal and risky behaviors, expanding our research beyond the current scope.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"13373-13386\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10937104/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10937104/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在自动驾驶系统中，检测行人的危险行为对于确保人车交互的安全至关重要。行人的行为不仅取决于其个人行为，还受到其与周围环境相互作用的影响。考虑到环境上下文对行人行为的影响，近年来的研究提出了各种提取上下文信息的方法。然而，有效地模拟行人行为的环境上下文特征仍然是一个挑战。在本文中，我们提出了一种面向目标的环境上下文建模方法，该方法考虑了目标特定特征在构建上下文特征中的作用，实现了目标自适应上下文特征的提取。我们的解决方案，称为状态检测变压器（SDETR），为危险行人行为检测提供了端到端的框架。首先，我们设计了一个双级特征编码器，有效地解耦了高级目标语义和低级环境纹理。具体来说，纹理编码可以实现无标签的环境特征提取。然后，我们开发了一个对象-环境感知解码器，该解码器可以灵活地利用对象特征解码跨特征领域的上下文特征。最后，利用特征融合头将目标特征与环境状态特征进行融合，作为检测输出。实验表明，SDETR在两种典型的危险行为检测任务（交叉检测和入侵检测）中表现优异。我们报告了JAAD数据集上87.3%的准确率和Cityintrusion数据集上76.3%的准确率的新记录，这明显优于之前发表的所有结果。从业人员注意事项：本文的动机是检测交通场景中的行人危险行为，特别关注行人环境文脉的构建。现有的提取环境上下文的方法通常涉及复杂的特征提取或背景的标签描述，然后是行人与周围环境之间相互作用的机制建立。本文提出了一种灵活、经济的环境上下文建模方法。我们采用注意机制自主解码跨特征域上下文状态特征，使用目标特征作为查询特征。此外，我们采用了目标和背景的双层次特征提取方法（目标语义和背景纹理），显著降低了环境描述的标注成本。初步实验表明，该方法在行人过马路和行人入侵两种危险行人行为检测中是可行的。但是，目前还没有推广到其他行人危险行为任务中。在未来的研究中，我们打算更深入地研究对未标记行人异常和危险行为的识别，将我们的研究范围扩大到目前的范围之外。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Target-Oriented Environmental Context Modeling for Pedestrian Risky Behavior Detection

In autonomous driving systems, detecting pedestrian risky behavior is crucial for ensuring the safety of human-vehicle interactions. The behavior of pedestrians is not only determined by their individual actions but also influenced by their interactions with the surrounding environment. Considering the impact of environmental context on pedestrian behavior, recent studies have proposed various methods for extracting contextual information. However, efficiently modeling the environmental contextual features of pedestrian behavior remains a challenge. In this paper, we propose a target-oriented environmental context modeling method that accounts for the role of target-specific features in constructing context features, achieving target-adaptive context feature extraction. Our solution, called State DEtection TRansformer (SDETR), provides an end-to-end framework for risky pedestrian behavior detection. First, we devise a Dual-level Feature Encoder that effectively decouples high-level target semantic and low-level environmental texture. Specifically, the texture encoding enables label-free environmental feature extraction. Then, we develop a Object-Environment Perception Decoder that flexibly decodes cross-feature domain contextual features using object features. Finally, a Feature Fusion Head is employed to merge object features with environmental state features for the detection output. Experiments demonstrate the outstanding performance of SDETR in two typical risky behavior detection tasks (crossing detection and intrusion detection). We report a new record of 87.3% accuracy on the JAAD dataset and 76.3% accuracy on the Cityintrusion dataset, which significantly outperforms all previously published results. Note to Practitioners—The motivation of this paper is to detect risky pedestrian behavior in traffic scenes, with a particular focus on the construction of pedestrian environmental context. Existing methods for extracting environmental context typically involve complex feature extraction or label descriptions of the background, followed by a mechanistic establishment of the interplay between pedestrians and their surroundings. This paper proposes a flexible and cost-effective method for modeling environmental context. We employ an attention mechanism to autonomously decode cross-feature domain contextual state features, using target features as query features. Moreover, we employ dual-level feature extraction methods for targets and backgrounds (target semantics and background textures), significantly reducing the labeling cost for environmental description. Preliminary experiments suggest that this method is feasible in the detection of two risky pedestrian behaviors: pedestrian crossing and pedestrian intrusion. However, it has not yet been extended to other pedestrian dangerous behavior tasks. In future research, we intend to delve deeper into the recognition of unlabeled pedestrian abnormal and risky behaviors, expanding our research beyond the current scope.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.