Florian Mouret, Mohanad Albughdadi, S. Duthoit, D. Kouamé, J. Tourneret
{"title":"Robust Estimation of Gaussian Mixture Models Using Anomaly Scores and Bayesian Information Criterion for Missing Value Imputation","authors":"Florian Mouret, Mohanad Albughdadi, S. Duthoit, D. Kouamé, J. Tourneret","doi":"10.23919/eusipco55093.2022.9909815","DOIUrl":null,"url":null,"abstract":"The Expectation-Maximization algorithm is a very popular approach for estimating the parameters of Gaussian mixture models (GMMs). A known issue with GMM estimation is its sensitivity to outliers, which can lead to poor estimation performance depending on the dataset under consideration. A common approach to deal with this issue is robust estimation, which typically consists of reducing the influence of the outliers on the estimators by weighting the impact of some samples of the dataset considered as outliers. In an unsupervised context, it is difficult to know which sample from the database corresponds to a normal observation. To that extent, we propose to use within the EM algorithm an outlier detection step that attributes an anomaly score to each sample of the database in an unsupervised way. A modified Bayesian Information Criterion is also introduced to efficiently select the appropriate amount of outliers contained in a dataset. The proposed method is tested on a benchmark remote sensing dataset coming from the UCI Machine Learning Repository. The experimental results show the interest of the proposed robustification when compared to other benchmark imputation procedures.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Expectation-Maximization algorithm is a very popular approach for estimating the parameters of Gaussian mixture models (GMMs). A known issue with GMM estimation is its sensitivity to outliers, which can lead to poor estimation performance depending on the dataset under consideration. A common approach to deal with this issue is robust estimation, which typically consists of reducing the influence of the outliers on the estimators by weighting the impact of some samples of the dataset considered as outliers. In an unsupervised context, it is difficult to know which sample from the database corresponds to a normal observation. To that extent, we propose to use within the EM algorithm an outlier detection step that attributes an anomaly score to each sample of the database in an unsupervised way. A modified Bayesian Information Criterion is also introduced to efficiently select the appropriate amount of outliers contained in a dataset. The proposed method is tested on a benchmark remote sensing dataset coming from the UCI Machine Learning Repository. The experimental results show the interest of the proposed robustification when compared to other benchmark imputation procedures.