{"title":"Dealing with Distribution Shift in Acoustic Mosquito Datasets","authors":"H. Y. Nkouanga, Suresh Singh","doi":"10.1109/ICMLA55696.2022.00246","DOIUrl":null,"url":null,"abstract":"In recent years, the task of detecting mosquito presence through acoustic data has drawn the attention of many researchers. However, just like in any other detection task, these researchers are often confronted with the distribution shift problem, which alludes to the situation where the training and test datasets do not share the same distribution. A detection system is almost always guaranteed to fail during testing when this situation arises. Solutions to this issue have been proposed over the years, but they are often computationally expensive and complex to implement. In this paper, we propose a simple solution that consists in (1) identifying and getting rid of the noise present in the input data, (2) performing a dimensionality reduction, and (3) classifying the data. We tested our technique on a large and publicly available dataset of mosquito recordings (HumBugDB) and the results showed a maximum improvement of nearly 28% when compared to a baseline classification system.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the task of detecting mosquito presence through acoustic data has drawn the attention of many researchers. However, just like in any other detection task, these researchers are often confronted with the distribution shift problem, which alludes to the situation where the training and test datasets do not share the same distribution. A detection system is almost always guaranteed to fail during testing when this situation arises. Solutions to this issue have been proposed over the years, but they are often computationally expensive and complex to implement. In this paper, we propose a simple solution that consists in (1) identifying and getting rid of the noise present in the input data, (2) performing a dimensionality reduction, and (3) classifying the data. We tested our technique on a large and publicly available dataset of mosquito recordings (HumBugDB) and the results showed a maximum improvement of nearly 28% when compared to a baseline classification system.