{"title":"基于车载传感器的自我中心驾驶视频的驾驶行为感知字幕生成*","authors":"Hongkuan Zhang, Koichi Takeda, Ryohei Sasano, Yusuke Adachi, Kento Ohtani","doi":"10.1109/ivworkshops54471.2021.9669259","DOIUrl":null,"url":null,"abstract":"Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.","PeriodicalId":256905,"journal":{"name":"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Driving Behavior Aware Caption Generation for Egocentric Driving Videos Using In-Vehicle Sensors*\",\"authors\":\"Hongkuan Zhang, Koichi Takeda, Ryohei Sasano, Yusuke Adachi, Kento Ohtani\",\"doi\":\"10.1109/ivworkshops54471.2021.9669259\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.\",\"PeriodicalId\":256905,\"journal\":{\"name\":\"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ivworkshops54471.2021.9669259\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ivworkshops54471.2021.9669259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Driving Behavior Aware Caption Generation for Egocentric Driving Videos Using In-Vehicle Sensors*
Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.