{"title":"利用主成分检测多变量分布中的异常值","authors":"Aldwin M. Teves","doi":"10.29121/ijoest.v7.i2.2023.488","DOIUrl":null,"url":null,"abstract":"It is crucial to make inference out of the data at hand. It makes sense to discard spurious observations prior to application of statistical analysis. This study advances a procedure of determining outliers based from the principal components of the original variables. These variables are sorted and given weights based on the magnitude of their inner product with the principal components formulated from the centered and scaled variables. The weights are the corresponding variances explained by the principal components. The measure of proximity among observations is proportionate to the variance (eigenvalues) associated with the principal components. The methodology defines two distinct subintervals where the suspected outliers settle in one of these subintervals based on the proximity measures δo. On the merit of simulated data, the procedure detected 100 percent when the outliers are coming from distinct distribution. On the other hand, the procedure detected 98.7 per cent when the distribution of outliers have equal variance-covariance matrix with the outlier-free distribution and a slight difference in the vector of means.","PeriodicalId":331301,"journal":{"name":"International Journal of Engineering Science Technologies","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DETECTING OUTLIER IN THE MULTIVARIATE DISTRIBUTION USING PRINCIPAL COMPONENTS\",\"authors\":\"Aldwin M. Teves\",\"doi\":\"10.29121/ijoest.v7.i2.2023.488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is crucial to make inference out of the data at hand. It makes sense to discard spurious observations prior to application of statistical analysis. This study advances a procedure of determining outliers based from the principal components of the original variables. These variables are sorted and given weights based on the magnitude of their inner product with the principal components formulated from the centered and scaled variables. The weights are the corresponding variances explained by the principal components. The measure of proximity among observations is proportionate to the variance (eigenvalues) associated with the principal components. The methodology defines two distinct subintervals where the suspected outliers settle in one of these subintervals based on the proximity measures δo. On the merit of simulated data, the procedure detected 100 percent when the outliers are coming from distinct distribution. On the other hand, the procedure detected 98.7 per cent when the distribution of outliers have equal variance-covariance matrix with the outlier-free distribution and a slight difference in the vector of means.\",\"PeriodicalId\":331301,\"journal\":{\"name\":\"International Journal of Engineering Science Technologies\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Engineering Science Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29121/ijoest.v7.i2.2023.488\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering Science Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29121/ijoest.v7.i2.2023.488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DETECTING OUTLIER IN THE MULTIVARIATE DISTRIBUTION USING PRINCIPAL COMPONENTS
It is crucial to make inference out of the data at hand. It makes sense to discard spurious observations prior to application of statistical analysis. This study advances a procedure of determining outliers based from the principal components of the original variables. These variables are sorted and given weights based on the magnitude of their inner product with the principal components formulated from the centered and scaled variables. The weights are the corresponding variances explained by the principal components. The measure of proximity among observations is proportionate to the variance (eigenvalues) associated with the principal components. The methodology defines two distinct subintervals where the suspected outliers settle in one of these subintervals based on the proximity measures δo. On the merit of simulated data, the procedure detected 100 percent when the outliers are coming from distinct distribution. On the other hand, the procedure detected 98.7 per cent when the distribution of outliers have equal variance-covariance matrix with the outlier-free distribution and a slight difference in the vector of means.