DDC-Chat:通过视觉语言模型的指令调优,实现对分心司机的准确分类

IF 3.7 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Chupei Liao, Kuoyi Lin
{"title":"DDC-Chat:通过视觉语言模型的指令调优,实现对分心司机的准确分类","authors":"Chupei Liao,&nbsp;Kuoyi Lin","doi":"10.1016/j.jnlssr.2024.10.001","DOIUrl":null,"url":null,"abstract":"<div><div>Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in <strong>D</strong>istracted <strong>D</strong>riving <strong>C</strong>lassification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a <strong>V</strong>isual large <strong>L</strong>anguage <strong>M</strong>odel (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.</div></div>","PeriodicalId":62710,"journal":{"name":"安全科学与韧性(英文)","volume":"6 2","pages":"Pages 250-264"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DDC-Chat: Achieving accurate distracted driver classification through instruction tuning of visual language model\",\"authors\":\"Chupei Liao,&nbsp;Kuoyi Lin\",\"doi\":\"10.1016/j.jnlssr.2024.10.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in <strong>D</strong>istracted <strong>D</strong>riving <strong>C</strong>lassification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a <strong>V</strong>isual large <strong>L</strong>anguage <strong>M</strong>odel (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.</div></div>\",\"PeriodicalId\":62710,\"journal\":{\"name\":\"安全科学与韧性(英文)\",\"volume\":\"6 2\",\"pages\":\"Pages 250-264\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"安全科学与韧性(英文)\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666449624000781\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"安全科学与韧性(英文)","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666449624000781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

驾驶员行为是影响道路安全的关键因素,因此需要先进的分心驾驶分类方法。在本研究中,我们引入了一种新的基于视觉大语言模型(VLM)的DDC-Chat分类方法。DDC-Chat是建立在LLAVA-Plus基础上的交互式多模式系统,专门针对分心驾驶检测进行了微调。它利用逻辑推理链来激活视觉技能,包括分割和姿态检测,通过端到端训练。此外,指令调整允许DDC-Chat不断融入新的视觉技能,增强其分类分心驾驶行为的能力。我们广泛的实验表明,DDC- chat在公共DDC数据集上实现了最先进的性能,超过了以前的基准。在对100个驾驶员数据集的评估中,该模型在零射击和少射击学习环境中都表现出优异的结果,通过准确识别驾驶员分心,将其建立为提高驾驶安全性的有价值工具。由于推理的计算强度,DDC-Chat针对远程服务器的部署进行了优化,并使用来自车载监控系统的数据流进行实时分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DDC-Chat: Achieving accurate distracted driver classification through instruction tuning of visual language model
Driver behavior is a critical factor in road safety, highlighting the need for advanced methods in Distracted Driving Classification (DDC). In this study, we introduce DDC-Chat, a novel classification method based on a Visual large Language Model (VLM). DDC-Chat is an interactive multimodal system built upon LLAVA-Plus, fine-tuned specifically for addressing distracted driving detection. It utilizes logical reasoning chains to activate visual skills, including segmentation and pose detection, through end-to-end training. Furthermore, instruction tuning allows DDC-Chat to continuously incorporate new visual skills, enhancing its ability to classify distracted driving behavior. Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets, surpassing previous benchmarks. In evaluations on the 100-Driver dataset, the model exhibits superior results in both zero-shot and few-shot learning contexts, establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction. Due to the computational intensity of inference, DDC-Chat is optimized for deployment on remote servers, with data streamed from in-vehicle monitoring systems for real-time analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
安全科学与韧性(英文)
安全科学与韧性(英文) Management Science and Operations Research, Safety, Risk, Reliability and Quality, Safety Research
CiteScore
8.70
自引率
0.00%
发文量
0
审稿时长
72 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信