{"title":"基于时空双分支特征引导的驾驶员注意力预测融合网络","authors":"Yuekui Zhang , Yunzuo Zhang , Yaoge Xiao , Tong Wang","doi":"10.1016/j.eswa.2025.128564","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting the driver’s gaze area is crucial for safe driving in rapidly changing traffic scenarios. However, existing driver attention prediction models generally suffer from two key limitations: insufficient utilization of spatial scale features, which hinders the precise capture of critical information in the scene; the lack of effective guidance from motion information between video frames, making it difficult to assess dynamic changes in the surrounding environment accurately. To address these issues, we propose a Spatiotemporal Dual-branch Feature-guided Fusion Network (SDFF-Net). Specifically, in the spatial branch, we design a Multi-scale Feature Aggregation (MFA) module to enhance the representation of detailed features by constructing bidirectional sampling and layer-by-layer correlation paths, enabling comprehensive extraction of saliency cues across receptive fields. In the temporal branch, we introduce an Attention Transfer Mechanism (ATM) to guide temporal modeling across consecutive frames, improving the ability to capture long-distance dependencies. Finally, we fuse the spatiotemporal features and decode them to generate the predicted saliency map. Experimental results on the DADA-2000 and TDV datasets show that the proposed SDFF-Net achieves state-of-the-art performance in driver attention prediction, outperforming existing methods in multiple evaluation metrics. Benefiting from its efficient dual-branch architecture, SDFF-Net is well-suited for deployment in resource-constrained environments, providing reliable real-time attention prediction, which is of great significance for enhancing driving safety and supporting advanced driver assistance systems in complex traffic scenarios.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128564"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal dual-branch feature-guided fusion network for driver attention prediction\",\"authors\":\"Yuekui Zhang , Yunzuo Zhang , Yaoge Xiao , Tong Wang\",\"doi\":\"10.1016/j.eswa.2025.128564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Predicting the driver’s gaze area is crucial for safe driving in rapidly changing traffic scenarios. However, existing driver attention prediction models generally suffer from two key limitations: insufficient utilization of spatial scale features, which hinders the precise capture of critical information in the scene; the lack of effective guidance from motion information between video frames, making it difficult to assess dynamic changes in the surrounding environment accurately. To address these issues, we propose a Spatiotemporal Dual-branch Feature-guided Fusion Network (SDFF-Net). Specifically, in the spatial branch, we design a Multi-scale Feature Aggregation (MFA) module to enhance the representation of detailed features by constructing bidirectional sampling and layer-by-layer correlation paths, enabling comprehensive extraction of saliency cues across receptive fields. In the temporal branch, we introduce an Attention Transfer Mechanism (ATM) to guide temporal modeling across consecutive frames, improving the ability to capture long-distance dependencies. Finally, we fuse the spatiotemporal features and decode them to generate the predicted saliency map. Experimental results on the DADA-2000 and TDV datasets show that the proposed SDFF-Net achieves state-of-the-art performance in driver attention prediction, outperforming existing methods in multiple evaluation metrics. Benefiting from its efficient dual-branch architecture, SDFF-Net is well-suited for deployment in resource-constrained environments, providing reliable real-time attention prediction, which is of great significance for enhancing driving safety and supporting advanced driver assistance systems in complex traffic scenarios.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"292 \",\"pages\":\"Article 128564\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425021839\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425021839","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Spatiotemporal dual-branch feature-guided fusion network for driver attention prediction
Predicting the driver’s gaze area is crucial for safe driving in rapidly changing traffic scenarios. However, existing driver attention prediction models generally suffer from two key limitations: insufficient utilization of spatial scale features, which hinders the precise capture of critical information in the scene; the lack of effective guidance from motion information between video frames, making it difficult to assess dynamic changes in the surrounding environment accurately. To address these issues, we propose a Spatiotemporal Dual-branch Feature-guided Fusion Network (SDFF-Net). Specifically, in the spatial branch, we design a Multi-scale Feature Aggregation (MFA) module to enhance the representation of detailed features by constructing bidirectional sampling and layer-by-layer correlation paths, enabling comprehensive extraction of saliency cues across receptive fields. In the temporal branch, we introduce an Attention Transfer Mechanism (ATM) to guide temporal modeling across consecutive frames, improving the ability to capture long-distance dependencies. Finally, we fuse the spatiotemporal features and decode them to generate the predicted saliency map. Experimental results on the DADA-2000 and TDV datasets show that the proposed SDFF-Net achieves state-of-the-art performance in driver attention prediction, outperforming existing methods in multiple evaluation metrics. Benefiting from its efficient dual-branch architecture, SDFF-Net is well-suited for deployment in resource-constrained environments, providing reliable real-time attention prediction, which is of great significance for enhancing driving safety and supporting advanced driver assistance systems in complex traffic scenarios.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.