{"title":"基于分类器置信度的文本数据异常点和新颖性检测方法","authors":"N. Pizurica, S. Tomovic","doi":"10.3233/aic-200649","DOIUrl":null,"url":null,"abstract":"In this paper we present an approach for novelty detection in text data. The approach can also be considered as semi-supervised anomaly detection because it operates with the training dataset containing labelled instances for the known classes only. During the training phase the classification model is learned. It is assumed that at least two known classes exist in the available training dataset. In the testing phase instances are classified as normal or anomalous based on the classifier confidence. In other words, if the classifier cannot assign any of the known class labels to the given instance with sufficiently high confidence (probability), the instance will be declared as novelty (anomaly). We propose two procedures to objectively measure the classifier confidence. Experimental results show that the proposed approach is comparable to methods known in the literature.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"39 1","pages":"139-153"},"PeriodicalIF":1.4000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An approach for outlier and novelty detection for text data based on classifier confidence\",\"authors\":\"N. Pizurica, S. Tomovic\",\"doi\":\"10.3233/aic-200649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present an approach for novelty detection in text data. The approach can also be considered as semi-supervised anomaly detection because it operates with the training dataset containing labelled instances for the known classes only. During the training phase the classification model is learned. It is assumed that at least two known classes exist in the available training dataset. In the testing phase instances are classified as normal or anomalous based on the classifier confidence. In other words, if the classifier cannot assign any of the known class labels to the given instance with sufficiently high confidence (probability), the instance will be declared as novelty (anomaly). We propose two procedures to objectively measure the classifier confidence. Experimental results show that the proposed approach is comparable to methods known in the literature.\",\"PeriodicalId\":50835,\"journal\":{\"name\":\"AI Communications\",\"volume\":\"39 1\",\"pages\":\"139-153\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2020-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI Communications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/aic-200649\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/aic-200649","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
An approach for outlier and novelty detection for text data based on classifier confidence
In this paper we present an approach for novelty detection in text data. The approach can also be considered as semi-supervised anomaly detection because it operates with the training dataset containing labelled instances for the known classes only. During the training phase the classification model is learned. It is assumed that at least two known classes exist in the available training dataset. In the testing phase instances are classified as normal or anomalous based on the classifier confidence. In other words, if the classifier cannot assign any of the known class labels to the given instance with sufficiently high confidence (probability), the instance will be declared as novelty (anomaly). We propose two procedures to objectively measure the classifier confidence. Experimental results show that the proposed approach is comparable to methods known in the literature.
期刊介绍:
AI Communications is a journal on artificial intelligence (AI) which has a close relationship to EurAI (European Association for Artificial Intelligence, formerly ECCAI). It covers the whole AI community: Scientific institutions as well as commercial and industrial companies.
AI Communications aims to enhance contacts and information exchange between AI researchers and developers, and to provide supranational information to those concerned with AI and advanced information processing. AI Communications publishes refereed articles concerning scientific and technical AI procedures, provided they are of sufficient interest to a large readership of both scientific and practical background. In addition it contains high-level background material, both at the technical level as well as the level of opinions, policies and news.