{"title":"识别多变量数据中的多个异常值","authors":"A. Hadi","doi":"10.1111/J.2517-6161.1992.TB01449.X","DOIUrl":null,"url":null,"abstract":"SUMMARY We propose a procedure for the detection of multiple outliers in multivariate data. Let Xbe an n x p data matrix representing n observations onp variates. We first order the n observations, using an appropriately chosen robust measure of outlyingness, then divide the data set into two initial subsets: a 'basic' subset which containsp + 1 'good' observations and a 'nonbasic' subset which contains the remaining n -p - 1 observations. Second, we compute the relative distance from each point in the data set to the centre of the basic subset, relative to the (possibly singular) covariance matrix of the basic subset. Third, we rearrange the n observations in ascending order accordingly, then divide the data set into two subsets: a basic subset which contains the first p +2 observations and a non-basic subset which contains the remaining n -p -2 observations. This process is repeated until an appropriately chosen stopping criterion is met. The final non-basic subset of observations is declared an outlying subset. The procedure proposed is illustrated and compared with existing methods by using several data sets. The procedure is simple, computationally inexpensive, suitable for automation, computable with widely available software packages, effective in dealing with masking and swamping problems and, most importantly, successful in identifying multivariate outliers.","PeriodicalId":17425,"journal":{"name":"Journal of the royal statistical society series b-methodological","volume":"45 1","pages":"761-771"},"PeriodicalIF":0.0000,"publicationDate":"1992-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"816","resultStr":"{\"title\":\"Identifying Multiple Outliers in Multivariate Data\",\"authors\":\"A. Hadi\",\"doi\":\"10.1111/J.2517-6161.1992.TB01449.X\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SUMMARY We propose a procedure for the detection of multiple outliers in multivariate data. Let Xbe an n x p data matrix representing n observations onp variates. We first order the n observations, using an appropriately chosen robust measure of outlyingness, then divide the data set into two initial subsets: a 'basic' subset which containsp + 1 'good' observations and a 'nonbasic' subset which contains the remaining n -p - 1 observations. Second, we compute the relative distance from each point in the data set to the centre of the basic subset, relative to the (possibly singular) covariance matrix of the basic subset. Third, we rearrange the n observations in ascending order accordingly, then divide the data set into two subsets: a basic subset which contains the first p +2 observations and a non-basic subset which contains the remaining n -p -2 observations. This process is repeated until an appropriately chosen stopping criterion is met. The final non-basic subset of observations is declared an outlying subset. The procedure proposed is illustrated and compared with existing methods by using several data sets. The procedure is simple, computationally inexpensive, suitable for automation, computable with widely available software packages, effective in dealing with masking and swamping problems and, most importantly, successful in identifying multivariate outliers.\",\"PeriodicalId\":17425,\"journal\":{\"name\":\"Journal of the royal statistical society series b-methodological\",\"volume\":\"45 1\",\"pages\":\"761-771\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"816\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the royal statistical society series b-methodological\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/J.2517-6161.1992.TB01449.X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the royal statistical society series b-methodological","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/J.2517-6161.1992.TB01449.X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identifying Multiple Outliers in Multivariate Data
SUMMARY We propose a procedure for the detection of multiple outliers in multivariate data. Let Xbe an n x p data matrix representing n observations onp variates. We first order the n observations, using an appropriately chosen robust measure of outlyingness, then divide the data set into two initial subsets: a 'basic' subset which containsp + 1 'good' observations and a 'nonbasic' subset which contains the remaining n -p - 1 observations. Second, we compute the relative distance from each point in the data set to the centre of the basic subset, relative to the (possibly singular) covariance matrix of the basic subset. Third, we rearrange the n observations in ascending order accordingly, then divide the data set into two subsets: a basic subset which contains the first p +2 observations and a non-basic subset which contains the remaining n -p -2 observations. This process is repeated until an appropriately chosen stopping criterion is met. The final non-basic subset of observations is declared an outlying subset. The procedure proposed is illustrated and compared with existing methods by using several data sets. The procedure is simple, computationally inexpensive, suitable for automation, computable with widely available software packages, effective in dealing with masking and swamping problems and, most importantly, successful in identifying multivariate outliers.