{"title":"Automatic speech-based alcohol intoxication detection for automotive safety applications","authors":"Brian Stasak , Julien Epps","doi":"10.1016/j.csl.2025.101872","DOIUrl":null,"url":null,"abstract":"<div><div>There is a responsibility to advance automatic alcohol intoxication screening capabilities in modern automobiles to reduce the high rate of alcohol-related accidents and fatalities worldwide. Automatic speech-based alcohol intoxication screening offers a tremendous safety opportunity in the automotive industry due to its non-invasive convenience, comparatively inexpensive cost, and rapid result processing. Using the Alcohol Language Corpus (ALC), this study examines automatic alcohol intoxication classification based on participants’ non-intoxicated/intoxicated omni-microphone speech recordings. Experimentation of many different speech features (e.g., glottal, landmarks, linguistic, prosodic, spectral, syllabic, vocal tract coordination) across different blood alcohol concentration (BAC) ranges and specific verbal tasks show significant changes as participants' BAC increases. Intoxicated participants produce lower average fundamental frequency (F0) with an increase in F0 frequency modulation, breathiness and creakiness voice qualities in intoxicated recordings when compared to their non-intoxicated recordings. For the picture description and tongue twister tasks, manual irregularity disfluency and pause linguistic features significantly increase in intoxicated recordings. Further, for all verbal tasks, automatically extracted syllabic pause features show a significant increase in intoxicated recordings. Implementation of task-dependent support vector machine classifier model with a ≥0.001 BAC 'intoxication' sensitivity threshold increases alcohol classification by up to 8% absolute gain over a task-agnostic approach. Moreover, intoxication classification results demonstrate that task-dependent modeling with majority vote decision improves classification accuracy with up to 20% absolute gain depending on task when compared to file-by-file task-agnostic method results reported previously in ALC baseline studies that used higher quality headset microphone recordings.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101872"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S088523082500097X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
There is a responsibility to advance automatic alcohol intoxication screening capabilities in modern automobiles to reduce the high rate of alcohol-related accidents and fatalities worldwide. Automatic speech-based alcohol intoxication screening offers a tremendous safety opportunity in the automotive industry due to its non-invasive convenience, comparatively inexpensive cost, and rapid result processing. Using the Alcohol Language Corpus (ALC), this study examines automatic alcohol intoxication classification based on participants’ non-intoxicated/intoxicated omni-microphone speech recordings. Experimentation of many different speech features (e.g., glottal, landmarks, linguistic, prosodic, spectral, syllabic, vocal tract coordination) across different blood alcohol concentration (BAC) ranges and specific verbal tasks show significant changes as participants' BAC increases. Intoxicated participants produce lower average fundamental frequency (F0) with an increase in F0 frequency modulation, breathiness and creakiness voice qualities in intoxicated recordings when compared to their non-intoxicated recordings. For the picture description and tongue twister tasks, manual irregularity disfluency and pause linguistic features significantly increase in intoxicated recordings. Further, for all verbal tasks, automatically extracted syllabic pause features show a significant increase in intoxicated recordings. Implementation of task-dependent support vector machine classifier model with a ≥0.001 BAC 'intoxication' sensitivity threshold increases alcohol classification by up to 8% absolute gain over a task-agnostic approach. Moreover, intoxication classification results demonstrate that task-dependent modeling with majority vote decision improves classification accuracy with up to 20% absolute gain depending on task when compared to file-by-file task-agnostic method results reported previously in ALC baseline studies that used higher quality headset microphone recordings.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.