{"title":"用于行人危险行为检测的目标导向型环境情境建模","authors":"Zhenyu Shi;Shibo He;Meng Zhang;Kun Shi","doi":"10.1109/TASE.2025.3553495","DOIUrl":null,"url":null,"abstract":"In autonomous driving systems, detecting pedestrian risky behavior is crucial for ensuring the safety of human-vehicle interactions. The behavior of pedestrians is not only determined by their individual actions but also influenced by their interactions with the surrounding environment. Considering the impact of environmental context on pedestrian behavior, recent studies have proposed various methods for extracting contextual information. However, efficiently modeling the environmental contextual features of pedestrian behavior remains a challenge. In this paper, we propose a target-oriented environmental context modeling method that accounts for the role of target-specific features in constructing context features, achieving target-adaptive context feature extraction. Our solution, called State DEtection TRansformer (SDETR), provides an end-to-end framework for risky pedestrian behavior detection. First, we devise a Dual-level Feature Encoder that effectively decouples high-level target semantic and low-level environmental texture. Specifically, the texture encoding enables label-free environmental feature extraction. Then, we develop a Object-Environment Perception Decoder that flexibly decodes cross-feature domain contextual features using object features. Finally, a Feature Fusion Head is employed to merge object features with environmental state features for the detection output. Experiments demonstrate the outstanding performance of SDETR in two typical risky behavior detection tasks (crossing detection and intrusion detection). We report a new record of 87.3% accuracy on the JAAD dataset and 76.3% accuracy on the Cityintrusion dataset, which significantly outperforms all previously published results. Note to Practitioners—The motivation of this paper is to detect risky pedestrian behavior in traffic scenes, with a particular focus on the construction of pedestrian environmental context. Existing methods for extracting environmental context typically involve complex feature extraction or label descriptions of the background, followed by a mechanistic establishment of the interplay between pedestrians and their surroundings. This paper proposes a flexible and cost-effective method for modeling environmental context. We employ an attention mechanism to autonomously decode cross-feature domain contextual state features, using target features as query features. Moreover, we employ dual-level feature extraction methods for targets and backgrounds (target semantics and background textures), significantly reducing the labeling cost for environmental description. Preliminary experiments suggest that this method is feasible in the detection of two risky pedestrian behaviors: pedestrian crossing and pedestrian intrusion. However, it has not yet been extended to other pedestrian dangerous behavior tasks. In future research, we intend to delve deeper into the recognition of unlabeled pedestrian abnormal and risky behaviors, expanding our research beyond the current scope.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"13373-13386"},"PeriodicalIF":6.4000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Target-Oriented Environmental Context Modeling for Pedestrian Risky Behavior Detection\",\"authors\":\"Zhenyu Shi;Shibo He;Meng Zhang;Kun Shi\",\"doi\":\"10.1109/TASE.2025.3553495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In autonomous driving systems, detecting pedestrian risky behavior is crucial for ensuring the safety of human-vehicle interactions. The behavior of pedestrians is not only determined by their individual actions but also influenced by their interactions with the surrounding environment. Considering the impact of environmental context on pedestrian behavior, recent studies have proposed various methods for extracting contextual information. However, efficiently modeling the environmental contextual features of pedestrian behavior remains a challenge. In this paper, we propose a target-oriented environmental context modeling method that accounts for the role of target-specific features in constructing context features, achieving target-adaptive context feature extraction. Our solution, called State DEtection TRansformer (SDETR), provides an end-to-end framework for risky pedestrian behavior detection. First, we devise a Dual-level Feature Encoder that effectively decouples high-level target semantic and low-level environmental texture. Specifically, the texture encoding enables label-free environmental feature extraction. Then, we develop a Object-Environment Perception Decoder that flexibly decodes cross-feature domain contextual features using object features. Finally, a Feature Fusion Head is employed to merge object features with environmental state features for the detection output. Experiments demonstrate the outstanding performance of SDETR in two typical risky behavior detection tasks (crossing detection and intrusion detection). We report a new record of 87.3% accuracy on the JAAD dataset and 76.3% accuracy on the Cityintrusion dataset, which significantly outperforms all previously published results. Note to Practitioners—The motivation of this paper is to detect risky pedestrian behavior in traffic scenes, with a particular focus on the construction of pedestrian environmental context. Existing methods for extracting environmental context typically involve complex feature extraction or label descriptions of the background, followed by a mechanistic establishment of the interplay between pedestrians and their surroundings. This paper proposes a flexible and cost-effective method for modeling environmental context. We employ an attention mechanism to autonomously decode cross-feature domain contextual state features, using target features as query features. Moreover, we employ dual-level feature extraction methods for targets and backgrounds (target semantics and background textures), significantly reducing the labeling cost for environmental description. Preliminary experiments suggest that this method is feasible in the detection of two risky pedestrian behaviors: pedestrian crossing and pedestrian intrusion. However, it has not yet been extended to other pedestrian dangerous behavior tasks. In future research, we intend to delve deeper into the recognition of unlabeled pedestrian abnormal and risky behaviors, expanding our research beyond the current scope.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"13373-13386\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10937104/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10937104/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Target-Oriented Environmental Context Modeling for Pedestrian Risky Behavior Detection
In autonomous driving systems, detecting pedestrian risky behavior is crucial for ensuring the safety of human-vehicle interactions. The behavior of pedestrians is not only determined by their individual actions but also influenced by their interactions with the surrounding environment. Considering the impact of environmental context on pedestrian behavior, recent studies have proposed various methods for extracting contextual information. However, efficiently modeling the environmental contextual features of pedestrian behavior remains a challenge. In this paper, we propose a target-oriented environmental context modeling method that accounts for the role of target-specific features in constructing context features, achieving target-adaptive context feature extraction. Our solution, called State DEtection TRansformer (SDETR), provides an end-to-end framework for risky pedestrian behavior detection. First, we devise a Dual-level Feature Encoder that effectively decouples high-level target semantic and low-level environmental texture. Specifically, the texture encoding enables label-free environmental feature extraction. Then, we develop a Object-Environment Perception Decoder that flexibly decodes cross-feature domain contextual features using object features. Finally, a Feature Fusion Head is employed to merge object features with environmental state features for the detection output. Experiments demonstrate the outstanding performance of SDETR in two typical risky behavior detection tasks (crossing detection and intrusion detection). We report a new record of 87.3% accuracy on the JAAD dataset and 76.3% accuracy on the Cityintrusion dataset, which significantly outperforms all previously published results. Note to Practitioners—The motivation of this paper is to detect risky pedestrian behavior in traffic scenes, with a particular focus on the construction of pedestrian environmental context. Existing methods for extracting environmental context typically involve complex feature extraction or label descriptions of the background, followed by a mechanistic establishment of the interplay between pedestrians and their surroundings. This paper proposes a flexible and cost-effective method for modeling environmental context. We employ an attention mechanism to autonomously decode cross-feature domain contextual state features, using target features as query features. Moreover, we employ dual-level feature extraction methods for targets and backgrounds (target semantics and background textures), significantly reducing the labeling cost for environmental description. Preliminary experiments suggest that this method is feasible in the detection of two risky pedestrian behaviors: pedestrian crossing and pedestrian intrusion. However, it has not yet been extended to other pedestrian dangerous behavior tasks. In future research, we intend to delve deeper into the recognition of unlabeled pedestrian abnormal and risky behaviors, expanding our research beyond the current scope.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.