{"title":"基于视频转换器的多视角多模式驾驶员分心检测对比学习","authors":"Hong Vin Koay, Joon Huang Chuah, C. Chow","doi":"10.1109/TENSYMP55890.2023.10223643","DOIUrl":null,"url":null,"abstract":"Distracted drivers are more likely to get involved in a fatal accident. Thus, detecting actions that may led to distraction should be prioritized to reduce road accidents. However, there are many actions that cause a driver to pivot his attention away from the road. Previous works on detecting distracted drivers are done through a defined set of actions that are considered as distraction. This type of dataset is known as ‘closed set’ since there are still many distraction actions that were not considered by the model. Being different from previous datasets and approaches, in this work, we utilize constructive learning to detect distractions through multiview and multimodal video. The dataset used is the Driver Anomaly Detection dataset. The model is tasked to identify normal and anomalous driving condition in an ‘open set’ manner, where there are unseen anomalous driving condition in the test set. We use Video Transformer as the backbone of the model and validate that the performance is better than convolutional-based backbone. Two views (front and top) of driving clips on two modalities (IR and depth) are used to train individual model. The results of different views and modalities are subsequently fused together. Our method achieves 0.9892 AUC and 97.02% accuracy with Swin-Tiny when considering both views and modalities.","PeriodicalId":314726,"journal":{"name":"2023 IEEE Region 10 Symposium (TENSYMP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Contrastive Learning with Video Transformer for Driver Distraction Detection through Multiview and Multimodal Video\",\"authors\":\"Hong Vin Koay, Joon Huang Chuah, C. Chow\",\"doi\":\"10.1109/TENSYMP55890.2023.10223643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distracted drivers are more likely to get involved in a fatal accident. Thus, detecting actions that may led to distraction should be prioritized to reduce road accidents. However, there are many actions that cause a driver to pivot his attention away from the road. Previous works on detecting distracted drivers are done through a defined set of actions that are considered as distraction. This type of dataset is known as ‘closed set’ since there are still many distraction actions that were not considered by the model. Being different from previous datasets and approaches, in this work, we utilize constructive learning to detect distractions through multiview and multimodal video. The dataset used is the Driver Anomaly Detection dataset. The model is tasked to identify normal and anomalous driving condition in an ‘open set’ manner, where there are unseen anomalous driving condition in the test set. We use Video Transformer as the backbone of the model and validate that the performance is better than convolutional-based backbone. Two views (front and top) of driving clips on two modalities (IR and depth) are used to train individual model. The results of different views and modalities are subsequently fused together. Our method achieves 0.9892 AUC and 97.02% accuracy with Swin-Tiny when considering both views and modalities.\",\"PeriodicalId\":314726,\"journal\":{\"name\":\"2023 IEEE Region 10 Symposium (TENSYMP)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Region 10 Symposium (TENSYMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENSYMP55890.2023.10223643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP55890.2023.10223643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Contrastive Learning with Video Transformer for Driver Distraction Detection through Multiview and Multimodal Video
Distracted drivers are more likely to get involved in a fatal accident. Thus, detecting actions that may led to distraction should be prioritized to reduce road accidents. However, there are many actions that cause a driver to pivot his attention away from the road. Previous works on detecting distracted drivers are done through a defined set of actions that are considered as distraction. This type of dataset is known as ‘closed set’ since there are still many distraction actions that were not considered by the model. Being different from previous datasets and approaches, in this work, we utilize constructive learning to detect distractions through multiview and multimodal video. The dataset used is the Driver Anomaly Detection dataset. The model is tasked to identify normal and anomalous driving condition in an ‘open set’ manner, where there are unseen anomalous driving condition in the test set. We use Video Transformer as the backbone of the model and validate that the performance is better than convolutional-based backbone. Two views (front and top) of driving clips on two modalities (IR and depth) are used to train individual model. The results of different views and modalities are subsequently fused together. Our method achieves 0.9892 AUC and 97.02% accuracy with Swin-Tiny when considering both views and modalities.