Challenges with Audio Classification using Image Based Approaches for Health Measurement Applications

Madison Cohen-McFarlane, R. Goubran, Bruce Wallace
{"title":"Challenges with Audio Classification using Image Based Approaches for Health Measurement Applications","authors":"Madison Cohen-McFarlane, R. Goubran, Bruce Wallace","doi":"10.1109/MeMeA49120.2020.9137254","DOIUrl":null,"url":null,"abstract":"Image classification has had huge success in recent years, mainly due to the vast array of databases available. The lack of audio databases presents a problem when it comes to creating a deep neural network classifier aimed at measurement and monitoring of health-related sounds. Such sounds (i.e. cough) can be indicative of worsening health conditions, specifically as it relates to remote monitoring of older adults. The application of pre-existing deep neural network image classifiers to audio classification has been presented as a potential solution. This paper describes some of the issues associated with utilizing audio spectrograms to retrain the AlexNet image classifier for the purpose of remote patient monitoring. The spatial invariance assumption of the classifier is further investigated by creating two different classification tasks based on spectrograms computed from notes on a classical piano at four different noise levels; (1) octave classification and (2) note classification. As expected, the AlexNet classifier with clean data performs better when classifying octaves (98%), when compared to the note classification (83 %). When evaluating on audio with noise, the note classifier performance decreases more than the octave classification performance.","PeriodicalId":152478,"journal":{"name":"2020 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA49120.2020.9137254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Image classification has had huge success in recent years, mainly due to the vast array of databases available. The lack of audio databases presents a problem when it comes to creating a deep neural network classifier aimed at measurement and monitoring of health-related sounds. Such sounds (i.e. cough) can be indicative of worsening health conditions, specifically as it relates to remote monitoring of older adults. The application of pre-existing deep neural network image classifiers to audio classification has been presented as a potential solution. This paper describes some of the issues associated with utilizing audio spectrograms to retrain the AlexNet image classifier for the purpose of remote patient monitoring. The spatial invariance assumption of the classifier is further investigated by creating two different classification tasks based on spectrograms computed from notes on a classical piano at four different noise levels; (1) octave classification and (2) note classification. As expected, the AlexNet classifier with clean data performs better when classifying octaves (98%), when compared to the note classification (83 %). When evaluating on audio with noise, the note classifier performance decreases more than the octave classification performance.
基于图像的音频分类在健康测量应用中的挑战
近年来,图像分类取得了巨大的成功,这主要归功于大量可用的数据库。在创建旨在测量和监测与健康有关的声音的深度神经网络分类器时,缺乏音频数据库提出了一个问题。这种声音(如咳嗽)可能表明健康状况恶化,特别是因为它与老年人的远程监测有关。将已有的深度神经网络图像分类器应用于音频分类是一种潜在的解决方案。本文描述了与利用音频频谱图重新训练AlexNet图像分类器以实现远程患者监测相关的一些问题。通过创建两种不同的分类任务,进一步研究了分类器的空间不变性假设,该任务基于四种不同噪声水平下古典钢琴音符的谱图计算;(1)八度分类和(2)音符分类。正如预期的那样,与音符分类(83%)相比,具有干净数据的AlexNet分类器在对八度进行分类时表现更好(98%)。当对带有噪声的音频进行评价时,音符分类器的性能下降幅度大于八度分类器的性能下降幅度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信