{"title":"PCA Tail as the Anomaly Indicator","authors":"O. Škvarek, M. Klimo, Jaroslav Kopčan","doi":"10.1109/ICETA51985.2020.9379267","DOIUrl":null,"url":null,"abstract":"Nowadays, tools based on machine learning becomes an integral part of education. Propper application of these tools brings benefits, but misuse can be dangerous. The pattern recognition system always indicates the class most similar to the submitted pattern based on the features extracted from the training set. Designers optimise recognisers for specific training set classes. Still, users may not be familiar with its preparation methodology, and thus, they may apply the recognition system to samples incompatible to the training set (outliers, novelties, anomalies). This paper analyses a tail remaining after linear principal component analysis as an anomaly indicator. A nonlinear approach based on generative adversarial networks (GAN) is also presented. In addition to the result of the recognition, the user also gets a level of its credibility categorised into three classes: accept, do not decide, reject. For example, Fashion-MNIST queries were submitted to the recogniser trained on the MNIST database. The proposed linear misuse detector refused all of them; the neural network-based detector failed in 4.81 % of queries. For a more detailed analysis, MNIST samples corrupted by Gaussian noise were admitted presented to the misuse detector trained on the noiseless MNIST dataset. The experiments revealed a sharp border between acceptance and non-acceptance (no decision or rejection) decisions.","PeriodicalId":149716,"journal":{"name":"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETA51985.2020.9379267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Nowadays, tools based on machine learning becomes an integral part of education. Propper application of these tools brings benefits, but misuse can be dangerous. The pattern recognition system always indicates the class most similar to the submitted pattern based on the features extracted from the training set. Designers optimise recognisers for specific training set classes. Still, users may not be familiar with its preparation methodology, and thus, they may apply the recognition system to samples incompatible to the training set (outliers, novelties, anomalies). This paper analyses a tail remaining after linear principal component analysis as an anomaly indicator. A nonlinear approach based on generative adversarial networks (GAN) is also presented. In addition to the result of the recognition, the user also gets a level of its credibility categorised into three classes: accept, do not decide, reject. For example, Fashion-MNIST queries were submitted to the recogniser trained on the MNIST database. The proposed linear misuse detector refused all of them; the neural network-based detector failed in 4.81 % of queries. For a more detailed analysis, MNIST samples corrupted by Gaussian noise were admitted presented to the misuse detector trained on the noiseless MNIST dataset. The experiments revealed a sharp border between acceptance and non-acceptance (no decision or rejection) decisions.