Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David F. Harwath, Edison Thomaz
{"title":"A Dataset for Foreground Speech Analysis With Smartwatches In Everyday Home Environments","authors":"Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David F. Harwath, Edison Thomaz","doi":"10.1109/ICASSPW59220.2023.10192949","DOIUrl":null,"url":null,"abstract":"Acoustic sensing has proved effective as a foundation for applications in health and human behavior analysis. In this work, we focus on detecting in-person social interactions in naturalistic settings from audio captured by a smartwatch. As a first step, it is critical to distinguish the speech of the individual wearing the watch (foreground speech) from all other sounds nearby, such as speech from other individuals and ambient sounds. Given the considerable burden of collecting and annotating real-world training data and the lack of existing online data resources, this paper introduces a dataset for foreground speech detection of users wearing a smartwatch. The data is collected from 39 participants interacting with family members in real homes. We then present a benchmark study for the dataset with different test setups. Furthermore, we explore a model-free heuristic method to identify foreground instances based on transfer learning embeddings.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSPW59220.2023.10192949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Acoustic sensing has proved effective as a foundation for applications in health and human behavior analysis. In this work, we focus on detecting in-person social interactions in naturalistic settings from audio captured by a smartwatch. As a first step, it is critical to distinguish the speech of the individual wearing the watch (foreground speech) from all other sounds nearby, such as speech from other individuals and ambient sounds. Given the considerable burden of collecting and annotating real-world training data and the lack of existing online data resources, this paper introduces a dataset for foreground speech detection of users wearing a smartwatch. The data is collected from 39 participants interacting with family members in real homes. We then present a benchmark study for the dataset with different test setups. Furthermore, we explore a model-free heuristic method to identify foreground instances based on transfer learning embeddings.