Abigail Albuquerque, Samuel Chibuoyim Uche, Emmanuel Agu
{"title":"Intoxication detection from speech using representations learned from self-supervised pre-training","authors":"Abigail Albuquerque, Samuel Chibuoyim Uche, Emmanuel Agu","doi":"10.1016/j.smhl.2025.100562","DOIUrl":null,"url":null,"abstract":"<div><div>Alcohol intoxication is one of the leading causes of death around the globe. Existing approaches to prevent Driving Under the Influence (DUI) are expensive, intrusive, or require external apparatus such as breathalyzers, which the drinker may not possess. Speech is a viable modality for detecting intoxication from changes in vocal patterns. Intoxicated speech is slower, has lower amplitude, and is more prone to errors at the sentence, word, and phonological levels than sober speech. However, intoxication detection from speech is challenging due to high inter- and intra-user variability and the confounding effects of other factors such as fatigue, which may also impair speech. This paper investigates Wav2Vec 2.0, a self-supervised neural network architecture, for intoxication classification from audio. Wav2Vec 2.0 is a Transformer-based model that has demonstrated remarkable performance in various speech-related tasks. It analyzes raw audio directly by applying a multi-head attention mechanism to latent audio representations and was pre-trained on the Librispeech, Libri-Light and EmoDB datasets. The proposed model achieved an unweighted average recall of 73.3%, outperforming state-of-the-art models, highlighting its potential for accurate DUI detection to prevent alcohol-related incidents.</div></div>","PeriodicalId":37151,"journal":{"name":"Smart Health","volume":"36 ","pages":"Article 100562"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart Health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352648325000236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Health Professions","Score":null,"Total":0}
引用次数: 0
Abstract
Alcohol intoxication is one of the leading causes of death around the globe. Existing approaches to prevent Driving Under the Influence (DUI) are expensive, intrusive, or require external apparatus such as breathalyzers, which the drinker may not possess. Speech is a viable modality for detecting intoxication from changes in vocal patterns. Intoxicated speech is slower, has lower amplitude, and is more prone to errors at the sentence, word, and phonological levels than sober speech. However, intoxication detection from speech is challenging due to high inter- and intra-user variability and the confounding effects of other factors such as fatigue, which may also impair speech. This paper investigates Wav2Vec 2.0, a self-supervised neural network architecture, for intoxication classification from audio. Wav2Vec 2.0 is a Transformer-based model that has demonstrated remarkable performance in various speech-related tasks. It analyzes raw audio directly by applying a multi-head attention mechanism to latent audio representations and was pre-trained on the Librispeech, Libri-Light and EmoDB datasets. The proposed model achieved an unweighted average recall of 73.3%, outperforming state-of-the-art models, highlighting its potential for accurate DUI detection to prevent alcohol-related incidents.