Automatic speech-based alcohol intoxication detection for automotive safety applications

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-08-09 DOI:10.1016/j.csl.2025.101872

Brian Stasak , Julien Epps

{"title":"Automatic speech-based alcohol intoxication detection for automotive safety applications","authors":"Brian Stasak , Julien Epps","doi":"10.1016/j.csl.2025.101872","DOIUrl":null,"url":null,"abstract":"<div><div>There is a responsibility to advance automatic alcohol intoxication screening capabilities in modern automobiles to reduce the high rate of alcohol-related accidents and fatalities worldwide. Automatic speech-based alcohol intoxication screening offers a tremendous safety opportunity in the automotive industry due to its non-invasive convenience, comparatively inexpensive cost, and rapid result processing. Using the Alcohol Language Corpus (ALC), this study examines automatic alcohol intoxication classification based on participants’ non-intoxicated/intoxicated omni-microphone speech recordings. Experimentation of many different speech features (e.g., glottal, landmarks, linguistic, prosodic, spectral, syllabic, vocal tract coordination) across different blood alcohol concentration (BAC) ranges and specific verbal tasks show significant changes as participants' BAC increases. Intoxicated participants produce lower average fundamental frequency (F0) with an increase in F0 frequency modulation, breathiness and creakiness voice qualities in intoxicated recordings when compared to their non-intoxicated recordings. For the picture description and tongue twister tasks, manual irregularity disfluency and pause linguistic features significantly increase in intoxicated recordings. Further, for all verbal tasks, automatically extracted syllabic pause features show a significant increase in intoxicated recordings. Implementation of task-dependent support vector machine classifier model with a ≥0.001 BAC 'intoxication' sensitivity threshold increases alcohol classification by up to 8% absolute gain over a task-agnostic approach. Moreover, intoxication classification results demonstrate that task-dependent modeling with majority vote decision improves classification accuracy with up to 20% absolute gain depending on task when compared to file-by-file task-agnostic method results reported previously in ALC baseline studies that used higher quality headset microphone recordings.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101872"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S088523082500097X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

There is a responsibility to advance automatic alcohol intoxication screening capabilities in modern automobiles to reduce the high rate of alcohol-related accidents and fatalities worldwide. Automatic speech-based alcohol intoxication screening offers a tremendous safety opportunity in the automotive industry due to its non-invasive convenience, comparatively inexpensive cost, and rapid result processing. Using the Alcohol Language Corpus (ALC), this study examines automatic alcohol intoxication classification based on participants’ non-intoxicated/intoxicated omni-microphone speech recordings. Experimentation of many different speech features (e.g., glottal, landmarks, linguistic, prosodic, spectral, syllabic, vocal tract coordination) across different blood alcohol concentration (BAC) ranges and specific verbal tasks show significant changes as participants' BAC increases. Intoxicated participants produce lower average fundamental frequency (F0) with an increase in F0 frequency modulation, breathiness and creakiness voice qualities in intoxicated recordings when compared to their non-intoxicated recordings. For the picture description and tongue twister tasks, manual irregularity disfluency and pause linguistic features significantly increase in intoxicated recordings. Further, for all verbal tasks, automatically extracted syllabic pause features show a significant increase in intoxicated recordings. Implementation of task-dependent support vector machine classifier model with a ≥0.001 BAC 'intoxication' sensitivity threshold increases alcohol classification by up to 8% absolute gain over a task-agnostic approach. Moreover, intoxication classification results demonstrate that task-dependent modeling with majority vote decision improves classification accuracy with up to 20% absolute gain depending on task when compared to file-by-file task-agnostic method results reported previously in ALC baseline studies that used higher quality headset microphone recordings.

查看原文本刊更多论文

基于语音的自动酒精中毒检测，用于汽车安全应用

有责任提高现代汽车的酒精中毒自动筛查能力，以降低世界范围内与酒精有关的事故和死亡率。基于语音的酒精中毒自动筛查由于其非侵入性的便利性、相对便宜的成本和快速的结果处理，为汽车行业提供了巨大的安全机会。使用酒精语言语料库（ALC），本研究基于参与者的非醉酒/醉酒全麦克风语音记录，检验了酒精中毒的自动分类。在不同的血液酒精浓度（BAC）范围和特定的言语任务中，许多不同的言语特征（如声门、标志、语言、韵律、谱、音节、声道协调）的实验表明，随着参与者BAC的增加，这些特征发生了显著的变化。与未醉酒的录音相比，醉酒的参与者产生较低的平均基频（F0）， F0调频、呼吸和吱吱声的音质在醉酒录音中有所增加。对于图片描述和绕口令任务，人工不规则性、不流畅性和停顿语言特征显著增加了醉酒录音。此外，在所有口头任务中，自动提取的音节停顿特征显示醉酒录音显著增加。与任务无关的方法相比，具有≥0.001 BAC“中毒”敏感性阈值的任务相关支持向量机分类器模型的实现使酒精分类的绝对增益增加了8%。此外，中毒分类结果表明，与先前在ALC基线研究中报告的使用更高质量耳机麦克风记录的逐文件任务不可知方法结果相比，具有多数投票决策的任务依赖模型可以根据任务提高分类精度，绝对增益高达20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.