{"title":"A Generic Anomaly Detection Approach Applied to Mixture-of-unigrams and Maritime Surveillance Data","authors":"Yifan Zhou, James Wright, S. Maskell","doi":"10.1109/SDF.2019.8916633","DOIUrl":null,"url":null,"abstract":"This paper proposes a new generic method to detect anomalies (i.e., statistical outliers) which can be used with a generative topic model. In this paper, we specify this method in the context of the Mixture-of-unigrams model, which is widely used in text mining. It is possible to detect anomalies with a topic model by applying a threshold to the likelihood. However, it is challenging to choose the threshold since the choice needs to consider both the similarities of the topics and the length of documents. This paper describes a new intuitive method to detect anomalies which simply manipulates the output of the trained model. Such an approach is anticipated to have parameters that are more intuitive to define for a given problem. To assess the utility of the proposed approach, we also present a use case involving identifying ships misreporting their ship-type using geo-location data from the Automatic Identification System (AIS) messages. We show that, if we train a model using data for one type of ship, it is possible to identify ships of another type as anomalous.","PeriodicalId":186196,"journal":{"name":"2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SDF.2019.8916633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper proposes a new generic method to detect anomalies (i.e., statistical outliers) which can be used with a generative topic model. In this paper, we specify this method in the context of the Mixture-of-unigrams model, which is widely used in text mining. It is possible to detect anomalies with a topic model by applying a threshold to the likelihood. However, it is challenging to choose the threshold since the choice needs to consider both the similarities of the topics and the length of documents. This paper describes a new intuitive method to detect anomalies which simply manipulates the output of the trained model. Such an approach is anticipated to have parameters that are more intuitive to define for a given problem. To assess the utility of the proposed approach, we also present a use case involving identifying ships misreporting their ship-type using geo-location data from the Automatic Identification System (AIS) messages. We show that, if we train a model using data for one type of ship, it is possible to identify ships of another type as anomalous.