{"title":"DDC-Chat:通过视觉语言模型的指令调优,实现对分心司机的准确分类","authors":"Chupei Liao, Kuoyi Lin","doi":"10.1016/j.jnlssr.2024.10.001","DOIUrl":null,"url":null,"abstract":"<div><div>Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in <strong>D</strong>istracted <strong>D</strong>riving <strong>C</strong>lassification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a <strong>V</strong>isual large <strong>L</strong>anguage <strong>M</strong>odel (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.</div></div>","PeriodicalId":62710,"journal":{"name":"安全科学与韧性(英文)","volume":"6 2","pages":"Pages 250-264"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DDC-Chat: Achieving accurate distracted driver classification through instruction tuning of visual language model\",\"authors\":\"Chupei Liao, Kuoyi Lin\",\"doi\":\"10.1016/j.jnlssr.2024.10.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in <strong>D</strong>istracted <strong>D</strong>riving <strong>C</strong>lassification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a <strong>V</strong>isual large <strong>L</strong>anguage <strong>M</strong>odel (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.</div></div>\",\"PeriodicalId\":62710,\"journal\":{\"name\":\"安全科学与韧性(英文)\",\"volume\":\"6 2\",\"pages\":\"Pages 250-264\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"安全科学与韧性(英文)\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666449624000781\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"安全科学与韧性(英文)","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666449624000781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
DDC-Chat: Achieving accurate distracted driver classification through instruction tuning of visual language model
Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in Distracted Driving Classification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a Visual large Language Model (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.