{"title":"估计离群值概率","authors":"Richard A. Bauder, T. Khoshgoftaar","doi":"10.1109/IRI.2017.19","DOIUrl":null,"url":null,"abstract":"Outlier detection is a critical function across a diverse range of tasks and domains. There are numerous outlier detection methods, the majority of which produce scores to indicate an outlier versus inlier. An issue with these scores is that they can be difficult to interpret and do not allow for comparisons between different methods. One solution is to convert the outlier score to probabilities. These probability estimates can provide understandable and meaningful results for assessing outlying values. Moreover, the probabilities can be combined to produce an ensemble of outlier detection methods, further enhancing the detection of outliers. In this paper, we propose a unique approach leveraging probabilistic programming to fit the original outlier score distributions to a 3-parameter Lognormal distribution. We provide empirical evidence for the use of this distribution, compare the probability estimates with the outlier scores, discuss confidence in these estimates, evaluate detection performance via the probabilities, and provide an ensemble detection example. Our research indicates this approach reasonably models the original outlier scores, resulting in meaningful outlier probability estimates.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Estimating Outlier Score Probabilities\",\"authors\":\"Richard A. Bauder, T. Khoshgoftaar\",\"doi\":\"10.1109/IRI.2017.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier detection is a critical function across a diverse range of tasks and domains. There are numerous outlier detection methods, the majority of which produce scores to indicate an outlier versus inlier. An issue with these scores is that they can be difficult to interpret and do not allow for comparisons between different methods. One solution is to convert the outlier score to probabilities. These probability estimates can provide understandable and meaningful results for assessing outlying values. Moreover, the probabilities can be combined to produce an ensemble of outlier detection methods, further enhancing the detection of outliers. In this paper, we propose a unique approach leveraging probabilistic programming to fit the original outlier score distributions to a 3-parameter Lognormal distribution. We provide empirical evidence for the use of this distribution, compare the probability estimates with the outlier scores, discuss confidence in these estimates, evaluate detection performance via the probabilities, and provide an ensemble detection example. Our research indicates this approach reasonably models the original outlier scores, resulting in meaningful outlier probability estimates.\",\"PeriodicalId\":254330,\"journal\":{\"name\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"volume\":\"164 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2017.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2017.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Outlier detection is a critical function across a diverse range of tasks and domains. There are numerous outlier detection methods, the majority of which produce scores to indicate an outlier versus inlier. An issue with these scores is that they can be difficult to interpret and do not allow for comparisons between different methods. One solution is to convert the outlier score to probabilities. These probability estimates can provide understandable and meaningful results for assessing outlying values. Moreover, the probabilities can be combined to produce an ensemble of outlier detection methods, further enhancing the detection of outliers. In this paper, we propose a unique approach leveraging probabilistic programming to fit the original outlier score distributions to a 3-parameter Lognormal distribution. We provide empirical evidence for the use of this distribution, compare the probability estimates with the outlier scores, discuss confidence in these estimates, evaluate detection performance via the probabilities, and provide an ensemble detection example. Our research indicates this approach reasonably models the original outlier scores, resulting in meaningful outlier probability estimates.