DDC-Chat：通过视觉语言模型的指令调优，实现对分心司机的准确分类

IF 3.4 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

安全科学与韧性(英文) Pub Date : 2024-11-22 DOI:10.1016/j.jnlssr.2024.10.001

Chupei Liao, Kuoyi Lin

{"title":"DDC-Chat：通过视觉语言模型的指令调优，实现对分心司机的准确分类","authors":"Chupei Liao, Kuoyi Lin","doi":"10.1016/j.jnlssr.2024.10.001","DOIUrl":null,"url":null,"abstract":"<div><div>Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in <strong>D</strong>istracted <strong>D</strong>riving <strong>C</strong>lassification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a <strong>V</strong>isual large <strong>L</strong>anguage <strong>M</strong>odel (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.</div></div>","PeriodicalId":62710,"journal":{"name":"安全科学与韧性(英文)","volume":"6 2","pages":"Pages 250-264"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DDC-Chat: Achieving accurate distracted driver classification through instruction tuning of visual language model\",\"authors\":\"Chupei Liao, Kuoyi Lin\",\"doi\":\"10.1016/j.jnlssr.2024.10.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in <strong>D</strong>istracted <strong>D</strong>riving <strong>C</strong>lassification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a <strong>V</strong>isual large <strong>L</strong>anguage <strong>M</strong>odel (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.</div></div>\",\"PeriodicalId\":62710,\"journal\":{\"name\":\"安全科学与韧性(英文)\",\"volume\":\"6 2\",\"pages\":\"Pages 250-264\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"安全科学与韧性(英文)\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666449624000781\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"安全科学与韧性(英文)","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666449624000781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

摘要

驾驶员行为是影响道路安全的关键因素，因此需要先进的分心驾驶分类方法。在本研究中，我们引入了一种新的基于视觉大语言模型（VLM）的DDC-Chat分类方法。DDC-Chat是建立在LLAVA-Plus基础上的交互式多模式系统，专门针对分心驾驶检测进行了微调。它利用逻辑推理链来激活视觉技能，包括分割和姿态检测，通过端到端训练。此外，指令调整允许DDC-Chat不断融入新的视觉技能，增强其分类分心驾驶行为的能力。我们广泛的实验表明，DDC- chat在公共DDC数据集上实现了最先进的性能，超过了以前的基准。在对100个驾驶员数据集的评估中，该模型在零射击和少射击学习环境中都表现出优异的结果，通过准确识别驾驶员分心，将其建立为提高驾驶安全性的有价值工具。由于推理的计算强度，DDC-Chat针对远程服务器的部署进行了优化，并使用来自车载监控系统的数据流进行实时分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DDC-Chat: Achieving accurate distracted driver classification through instruction tuning of visual language model

Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in Distracted Driving Classification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a Visual large Language Model (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

安全科学与韧性(英文) Management Science and Operations Research, Safety, Risk, Reliability and Quality, Safety Research

CiteScore

8.70

自引率

0.00%

发文量

审稿时长

72 days