{"title":"用于驱动行为预测的蒸馏路由变压器","authors":"Jun Gao, Jiangang Yi, Yi Lu Murphey","doi":"10.4271/09-12-01-0003","DOIUrl":null,"url":null,"abstract":"<div>The uncertainty of a driver’s state, the variability of the traffic environment, and the complexity of road conditions have made driving behavior a critical factor affecting traffic safety. Accurate predicting of driving behavior is therefore crucial for ensuring safe driving. In this research, an efficient framework, distilled routing transformer (DRTR), is proposed for driving behavior prediction using multiple modality data, i.e., front view video frames and vehicle signals. First, a cross-modal attention distiller is introduced, which distills the cross-modal attention knowledge of a fusion-encoder transformer to guide the training of our DRTR and learn deep interactions between different modalities. Second, since the multi-modal learning usually requires information from the macro view to the micro view, a self-attention (SA)-routing module is custom-designed for SA layers in DRTR for dynamic scheduling of global and local attentions for each input instance. Finally, a Mogrifier long short-term memory (Mogrifier LSTM) network is employed for DRTR to predict driving behaviors. We applied our approach to real-world data collected during drives in both urban and freeway environments by an instrumented vehicle. The experimental results demonstrate that the DRTR can predict the imminent driving behavior effectively while enjoying a faster inference speed than other state-of-the-art (SOTA) baselines.</div>","PeriodicalId":42847,"journal":{"name":"SAE International Journal of Transportation Safety","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distilled Routing Transformer for Driving Behavior Prediction\",\"authors\":\"Jun Gao, Jiangang Yi, Yi Lu Murphey\",\"doi\":\"10.4271/09-12-01-0003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>The uncertainty of a driver’s state, the variability of the traffic environment, and the complexity of road conditions have made driving behavior a critical factor affecting traffic safety. Accurate predicting of driving behavior is therefore crucial for ensuring safe driving. In this research, an efficient framework, distilled routing transformer (DRTR), is proposed for driving behavior prediction using multiple modality data, i.e., front view video frames and vehicle signals. First, a cross-modal attention distiller is introduced, which distills the cross-modal attention knowledge of a fusion-encoder transformer to guide the training of our DRTR and learn deep interactions between different modalities. Second, since the multi-modal learning usually requires information from the macro view to the micro view, a self-attention (SA)-routing module is custom-designed for SA layers in DRTR for dynamic scheduling of global and local attentions for each input instance. Finally, a Mogrifier long short-term memory (Mogrifier LSTM) network is employed for DRTR to predict driving behaviors. We applied our approach to real-world data collected during drives in both urban and freeway environments by an instrumented vehicle. The experimental results demonstrate that the DRTR can predict the imminent driving behavior effectively while enjoying a faster inference speed than other state-of-the-art (SOTA) baselines.</div>\",\"PeriodicalId\":42847,\"journal\":{\"name\":\"SAE International Journal of Transportation Safety\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SAE International Journal of Transportation Safety\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4271/09-12-01-0003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAE International Journal of Transportation Safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4271/09-12-01-0003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Distilled Routing Transformer for Driving Behavior Prediction
The uncertainty of a driver’s state, the variability of the traffic environment, and the complexity of road conditions have made driving behavior a critical factor affecting traffic safety. Accurate predicting of driving behavior is therefore crucial for ensuring safe driving. In this research, an efficient framework, distilled routing transformer (DRTR), is proposed for driving behavior prediction using multiple modality data, i.e., front view video frames and vehicle signals. First, a cross-modal attention distiller is introduced, which distills the cross-modal attention knowledge of a fusion-encoder transformer to guide the training of our DRTR and learn deep interactions between different modalities. Second, since the multi-modal learning usually requires information from the macro view to the micro view, a self-attention (SA)-routing module is custom-designed for SA layers in DRTR for dynamic scheduling of global and local attentions for each input instance. Finally, a Mogrifier long short-term memory (Mogrifier LSTM) network is employed for DRTR to predict driving behaviors. We applied our approach to real-world data collected during drives in both urban and freeway environments by an instrumented vehicle. The experimental results demonstrate that the DRTR can predict the imminent driving behavior effectively while enjoying a faster inference speed than other state-of-the-art (SOTA) baselines.