HearDrinking: Drunkenness detection and BACs predictions based on acoustic signal

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Pervasive and Mobile Computing Pub Date : 2025-02-10 DOI:10.1016/j.pmcj.2025.102020

Yuan Wu , Gaorong Zhao , Likairui Zhang , Xinrong Hu , Lei Ding

{"title":"HearDrinking: Drunkenness detection and BACs predictions based on acoustic signal","authors":"Yuan Wu , Gaorong Zhao , Likairui Zhang , Xinrong Hu , Lei Ding","doi":"10.1016/j.pmcj.2025.102020","DOIUrl":null,"url":null,"abstract":"<div><div>Alcohol poisoning is a severe health concern resulting from excessive drinking and can be life-threatening. By utilizing home monitoring, individuals can quickly determine their blood alcohol content, thus preventing it from reaching hazardous levels. However, most existing systems for drunkenness detection require extra hardware or much effort from the user, making these systems impractical for detecting drunkenness in real life. Motivated by this, we present a device-free, noise-resistant drunkenness detection system named HearDrinking based on smartphone, which utilizes microphone of smartphone to record human’s voice activity, then mine drunkenness related features to yield accurate drunkenness detection. However, using acoustic signal to detect drunkenness is non-trivial since voice activities are prone to be interfered by ambient noise, and extracting fine-grained representations related to drunkenness from voice activities remains unresolved. On one hand, HearDrinking employs a multi-modal fusion method to realize noise-resistant voice activity detection. On the other hand, HearDrinking initially calculates the log-Mel spectrograms from the speech signal. The log-Mel spectrograms contain temporal and spectral information absent in image data. Therefore, conventional convolutions designed for images often have limited effectiveness in extracting features from log-Mel spectrograms. To overcome this limitation, we integrate Omni-dimensional Dynamic Convolution (ODConv) with ShuffleNetV2, creating OD-ShuffleNetV2. ODConv replaces certain conventional convolutions in the ShuffleNetV2 network. Multiple convolution cores are fused based on the log-Mel spectrogram, taking into account multi-dimensional attention, thereby optimizing the network structure. Comprehensive experiments with 15 participants reveal drunkenness detection accuracy of 96.08% and Blood Alcohol Content (BAC) predictions with an average error of 5 mg/dl.</div></div>","PeriodicalId":49005,"journal":{"name":"Pervasive and Mobile Computing","volume":"108 ","pages":"Article 102020"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pervasive and Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574119225000094","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Alcohol poisoning is a severe health concern resulting from excessive drinking and can be life-threatening. By utilizing home monitoring, individuals can quickly determine their blood alcohol content, thus preventing it from reaching hazardous levels. However, most existing systems for drunkenness detection require extra hardware or much effort from the user, making these systems impractical for detecting drunkenness in real life. Motivated by this, we present a device-free, noise-resistant drunkenness detection system named HearDrinking based on smartphone, which utilizes microphone of smartphone to record human’s voice activity, then mine drunkenness related features to yield accurate drunkenness detection. However, using acoustic signal to detect drunkenness is non-trivial since voice activities are prone to be interfered by ambient noise, and extracting fine-grained representations related to drunkenness from voice activities remains unresolved. On one hand, HearDrinking employs a multi-modal fusion method to realize noise-resistant voice activity detection. On the other hand, HearDrinking initially calculates the log-Mel spectrograms from the speech signal. The log-Mel spectrograms contain temporal and spectral information absent in image data. Therefore, conventional convolutions designed for images often have limited effectiveness in extracting features from log-Mel spectrograms. To overcome this limitation, we integrate Omni-dimensional Dynamic Convolution (ODConv) with ShuffleNetV2, creating OD-ShuffleNetV2. ODConv replaces certain conventional convolutions in the ShuffleNetV2 network. Multiple convolution cores are fused based on the log-Mel spectrogram, taking into account multi-dimensional attention, thereby optimizing the network structure. Comprehensive experiments with 15 participants reveal drunkenness detection accuracy of 96.08% and Blood Alcohol Content (BAC) predictions with an average error of 5 mg/dl.

查看原文本刊更多论文

听觉饮酒：基于声信号的醉酒检测和BACs预测

酒精中毒是由过量饮酒引起的严重健康问题，可能危及生命。通过使用家庭监控，个人可以快速确定他们的血液酒精含量，从而防止其达到危险水平。然而，大多数现有的醉酒检测系统需要额外的硬件或用户的大量努力，使得这些系统在现实生活中检测醉酒不现实。基于此，我们提出了一种基于智能手机的无设备、抗噪声的醉酒检测系统——HearDrinking。该系统利用智能手机的麦克风记录人的语音活动，进而挖掘醉酒相关特征，实现准确的醉酒检测。然而，由于语音活动容易受到环境噪声的干扰，并且从语音活动中提取与醉酒相关的细粒度表示仍然没有解决，因此使用声学信号来检测醉酒是非常重要的。一方面，HearDrinking采用多模态融合方法实现抗噪声的语音活动检测。另一方面，HearDrinking首先从语音信号中计算log-Mel谱图。对数mel谱图包含了图像数据中没有的时间和光谱信息。因此，为图像设计的传统卷积在从对数-梅尔谱图中提取特征方面往往效果有限。为了克服这一限制，我们将全维动态卷积（ODConv）与ShuffleNetV2集成，创建了OD-ShuffleNetV2。ODConv取代了ShuffleNetV2网络中的某些传统卷积。基于log-Mel谱图融合多个卷积核，考虑到多维关注，从而优化网络结构。15名参与者的综合实验表明，醉酒检测准确率为96.08%，血液酒精含量（BAC）预测平均误差为5 mg/dl。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pervasive and Mobile Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS

CiteScore

7.70

自引率

2.30%

发文量

审稿时长

68 days

期刊介绍： As envisioned by Mark Weiser as early as 1991, pervasive computing systems and services have truly become integral parts of our daily lives. Tremendous developments in a multitude of technologies ranging from personalized and embedded smart devices (e.g., smartphones, sensors, wearables, IoTs, etc.) to ubiquitous connectivity, via a variety of wireless mobile communications and cognitive networking infrastructures, to advanced computing techniques (including edge, fog and cloud) and user-friendly middleware services and platforms have significantly contributed to the unprecedented advances in pervasive and mobile computing. Cutting-edge applications and paradigms have evolved, such as cyber-physical systems and smart environments (e.g., smart city, smart energy, smart transportation, smart healthcare, etc.) that also involve human in the loop through social interactions and participatory and/or mobile crowd sensing, for example. The goal of pervasive computing systems is to improve human experience and quality of life, without explicit awareness of the underlying communications and computing technologies. The Pervasive and Mobile Computing Journal (PMC) is a high-impact, peer-reviewed technical journal that publishes high-quality scientific articles spanning theory and practice, and covering all aspects of pervasive and mobile computing and systems.