{"title":"DDC-Chat: Achieving accurate distracted driver classification through instruction tuning of visual language model","authors":"Chupei Liao, Kuoyi Lin","doi":"10.1016/j.jnlssr.2024.10.001","DOIUrl":null,"url":null,"abstract":"<div><div>Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in <strong>D</strong>istracted <strong>D</strong>riving <strong>C</strong>lassification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a <strong>V</strong>isual large <strong>L</strong>anguage <strong>M</strong>odel (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.</div></div>","PeriodicalId":62710,"journal":{"name":"安全科学与韧性(英文)","volume":"6 2","pages":"Pages 250-264"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"安全科学与韧性(英文)","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666449624000781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in Distracted Driving Classification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a Visual large Language Model (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.