Gabriel Bibbó, Thomas Deacon, Arshdeep Singh, Mark D. Plumbley
{"title":"The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection","authors":"Gabriel Bibbó, Thomas Deacon, Arshdeep Singh, Mark D. Plumbley","doi":"arxiv-2409.11262","DOIUrl":null,"url":null,"abstract":"This paper presents a residential audio dataset to support sound event\ndetection research for smart home applications aimed at promoting wellbeing for\nolder adults. The dataset is constructed by deploying audio recording systems\nin the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic\ncharacteristics are documented through detailed floor plans and construction\nmaterial information to enable replication of the recording environments for AI\nmodel deployment. A novel automated speech removal pipeline is developed, using\npre-trained audio neural networks to detect and remove segments containing\nspoken voice, while preserving segments containing other sound events. The\nresulting dataset consists of privacy-compliant audio recordings that\naccurately capture the soundscapes and activities of daily living within\nresidential spaces. The paper details the dataset creation methodology, the\nspeech removal pipeline utilizing cascaded model architectures, and an analysis\nof the vocal label distribution to validate the speech removal process. This\ndataset enables the development and benchmarking of sound event detection\nmodels tailored specifically for in-home applications.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a residential audio dataset to support sound event
detection research for smart home applications aimed at promoting wellbeing for
older adults. The dataset is constructed by deploying audio recording systems
in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic
characteristics are documented through detailed floor plans and construction
material information to enable replication of the recording environments for AI
model deployment. A novel automated speech removal pipeline is developed, using
pre-trained audio neural networks to detect and remove segments containing
spoken voice, while preserving segments containing other sound events. The
resulting dataset consists of privacy-compliant audio recordings that
accurately capture the soundscapes and activities of daily living within
residential spaces. The paper details the dataset creation methodology, the
speech removal pipeline utilizing cascaded model architectures, and an analysis
of the vocal label distribution to validate the speech removal process. This
dataset enables the development and benchmarking of sound event detection
models tailored specifically for in-home applications.