Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications最新文献

筛选
英文 中文
Speech Disfluency Detection with Contextual Representation and Data Distillation 基于上下文表示和数据蒸馏的语音不流畅检测
Payal Mohapatra, Akash Pandey, Bashima Islam, Qi Zhu
{"title":"Speech Disfluency Detection with Contextual Representation and Data Distillation","authors":"Payal Mohapatra, Akash Pandey, Bashima Islam, Qi Zhu","doi":"10.1145/3539490.3539601","DOIUrl":"https://doi.org/10.1145/3539490.3539601","url":null,"abstract":"Stuttering affects almost 1% of the world's population. It has a deep sociological impact and hinders the people who stutter from taking advantage of voice-assisted services. Automatic stutter detection based on deep learning can help voice assistants to adapt themselves to atypical speech. However, disfluency data is very limited and expensive to generate. We propose a set of preprocessing techniques: (1) using data with high inter-annotator agreement, (2) balancing different classes, and (3) using contextual embeddings from a pretrained network. We then design a disfluency classification network (DisfluencyNet) for automated speech disfluency detection that takes these contextual embeddings as an input. We empirically demonstrate high performance using only a quarter of the data for training. We conduct experiments with different training data size, evaluate the model trained on the lowest amount of training data with SEP-28k baseline results, and evaluate the same model on the FluencyBank dataset baseline results. We observe that, even by using a quarter of the original size of the dataset, our F1 score is greater than 0.7 for all types of disfluencies except one,textit{ blocks}. Previous works also reported lower performance with textit{blocks} type of disfluency owing to its large diversity amongst speakers and events. Overall, with our approach using only a few minutes of data, we can train a robust network that outperforms the baseline results for all disfluencies by at least 5%. Such a result is important to stress the fact that we can now reduce the required amount of training data and are able to improve the quality of the dataset by appointing more than two annotators for labeling speech disfluency within a constrained labeling budget.","PeriodicalId":377149,"journal":{"name":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Beyond Microphone: mmWave-Based Interference-Resilient Voice Activity Detection 超越麦克风:基于毫米波的抗干扰语音活动检测
M. Z. Ozturk, Chenshu Wu, Beibei Wang, Min Wu, K. Liu
{"title":"Beyond Microphone: mmWave-Based Interference-Resilient Voice Activity Detection","authors":"M. Z. Ozturk, Chenshu Wu, Beibei Wang, Min Wu, K. Liu","doi":"10.1145/3539490.3539599","DOIUrl":"https://doi.org/10.1145/3539490.3539599","url":null,"abstract":"Microphone-based voice activity detection systems usually require hotword detection and they cannot perform well under the presence of interference and noise. Users attending online meetings in noisy environments usually mute and unmute their microphones manually due to the limited performance of interference-resilient VAD. In order to automate voice detection in challenging environments without dictionary limitations, we explore beyond microphones and propose to use mmWave-based sensing, which is already available in many smart phones and IoT devices. Our preliminary experiments in multiple places with several users indicate that mmWave-based VAD can match and surpass the performance of an audio-based VAD in noisy conditions, while being robust against interference.","PeriodicalId":377149,"journal":{"name":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127989693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
VoiceFind
Irtaza Shahid, Y. Bai, Nakul Garg, Nirupam Roy
{"title":"VoiceFind","authors":"Irtaza Shahid, Y. Bai, Nakul Garg, Nirupam Roy","doi":"10.1145/3539490.3539600","DOIUrl":"https://doi.org/10.1145/3539490.3539600","url":null,"abstract":"Robust speech enhancement is a key requirement for many emerging applications. It is challenging to recover clear speech in commodity devices, especially in noisy real-world scenarios. In this paper, we propose VoiceFind, which uses only two microphones to spatial filter the desired speech from all interference. Furthermore, to improve the intelligibility of the speech after filtering, we design a Conditional Generative Adversarial Network (cGAN) to reconstruct the desired speech from environmental noises and interference speeches. This is an early attempt to explore this direction. Results from simulation and real-world experiments show promise.","PeriodicalId":377149,"journal":{"name":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116717556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Conversational AI Therapist for Daily Function Screening in Home Environments 会话人工智能治疗师在家庭环境中进行日常功能筛查
Jingping Nie, Hanya Shao, Minghui Zhao, S. Xia, M. Preindl, Xiaofan Jiang
{"title":"Conversational AI Therapist for Daily Function Screening in Home Environments","authors":"Jingping Nie, Hanya Shao, Minghui Zhao, S. Xia, M. Preindl, Xiaofan Jiang","doi":"10.1145/3539490.3539603","DOIUrl":"https://doi.org/10.1145/3539490.3539603","url":null,"abstract":"The growth of smart devices is making typical homes more intelligent. In this work, in collaboration with therapists, we introduce a home-based AI therapist that takes advantage of the smart home environment to screen the day-to-day functioning and infer mental wellness of an occupant. Unlike existing “chatbot” works that identify the mental status of users through conversation, our AI therapist additionally leverages smart devices and sensors throughout the home to infer mental well-being and assesses a user's daily functioning. We propose a series of 37 dimensions of daily functioning, that our system observes through conversing with the user and detecting daily activity events using sensors and smart sensors throughout the home. Our system utilizes these 37 dimensions in conjunction with novel natural language processing architectures to detect abnormalities in mental status (e.g., angry or depressed), well-being, and daily functioning and generate responses to console users when abnormalities are detected. Through a series of user studies, we demonstrate that our system can converse with a user naturally, accurately detect abnormalities in well-being, and provide appropriate responses consoling users.","PeriodicalId":377149,"journal":{"name":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114196849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Empirical Analysis of Perforated Audio Classification 穿孔音频分类的实证分析
Mahathir Monjur, S. Nirjon
{"title":"An Empirical Analysis of Perforated Audio Classification","authors":"Mahathir Monjur, S. Nirjon","doi":"10.1145/3539490.3539602","DOIUrl":"https://doi.org/10.1145/3539490.3539602","url":null,"abstract":"Missing samples is common in many practical audio acquisition systems. These emph{perforated} audio clips are routinely discarded by today's audio classification systems -- even though they may have information that could have been used to make accurate inferences. In this paper, we study perforated audio classification problem on an intermittently-powered batteryless system. We model perforation, demonstrate how it affects the classification accuracy, and propose two approaches to deal with the problem. We conduct extensive experiments using over 115,000 audio clips from three popular audio datasets and quantify the loss of accuracy of a standard classifier when the input audio is perforated. We also empirically demonstrate how much of the loss of accuracy can be gained back by the two proposed approaches to deal with audio perforation.","PeriodicalId":377149,"journal":{"name":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","volume":"22 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132968955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BuMA
Kaiyuan Hou, S. Xia, Xiaofan Jiang
{"title":"BuMA","authors":"Kaiyuan Hou, S. Xia, Xiaofan Jiang","doi":"10.1145/3539490.3539598","DOIUrl":"https://doi.org/10.1145/3539490.3539598","url":null,"abstract":"Breath monitoring is important for monitoring illnesses, such as sleep apnea, for people of all ages. One cause of concern for parents is sudden infant death syndrome (SIDS), where an infant suddenly passes away during sleep, usually due to complications in breathing. There are a variety of works and products on the market for monitoring breathing, especially for children and infants. Many of these are wearables that require you to attach an accessory onto the child or person, which can be uncomfortable. Other solutions utilize a camera, which can be privacy-intrusive and function poorly during the night, when lighting is poor. In this work, we introduce BuMA, an audio-based, non-intrusive, and contactless, breathing monitoring system. BuMA utilizes a microphone array, beamforming, and audio filtering to enhance the sounds of breathing by filtering out several common noises in or near home environments, such as construction, speech, and music, that could make detection difficult. We show that BuMA improves breathing detection accuracy by up to 12%, within 30cm from a person, over existing audio filtering algorithms or platforms that do not leverage filtering.","PeriodicalId":377149,"journal":{"name":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124745853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信