Hemank Lamba, Thomas J. Glazier, J. Cámara, B. Schmerl, D. Garlan, J. Pfeffer
{"title":"Model-based Cluster Analysis for Identifying Suspicious Activity Sequences in Software","authors":"Hemank Lamba, Thomas J. Glazier, J. Cámara, B. Schmerl, D. Garlan, J. Pfeffer","doi":"10.1145/3041008.3041014","DOIUrl":null,"url":null,"abstract":"Large software systems have to contend with a significant number of users who interact with different components of the system in various ways. The sequences of components that are used as part of an interaction define sets of behaviors that users have with the system. These can be large in number. Among these users, it is possible that there are some who exhibit anomalous behaviors -- for example, they may have found back doors into the system and are doing something malicious. These anomalous behaviors can be hard to distinguish from normal behavior because of the number of interactions a system may have, or because traces may deviate only slightly from normal behavior. In this paper we describe a model-based approach to cluster sequences of user behaviors within a system and to find suspicious, or anomalous, sequences. We exploit the underlying software architecture of a system to define these sequences. We further show that our approach is better at detecting suspicious activities than other approaches, specifically those that use unigrams and bigrams for anomaly detection. We show this on a simulation of a large scale system based on Amazon Web application style architecture.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3041008.3041014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Large software systems have to contend with a significant number of users who interact with different components of the system in various ways. The sequences of components that are used as part of an interaction define sets of behaviors that users have with the system. These can be large in number. Among these users, it is possible that there are some who exhibit anomalous behaviors -- for example, they may have found back doors into the system and are doing something malicious. These anomalous behaviors can be hard to distinguish from normal behavior because of the number of interactions a system may have, or because traces may deviate only slightly from normal behavior. In this paper we describe a model-based approach to cluster sequences of user behaviors within a system and to find suspicious, or anomalous, sequences. We exploit the underlying software architecture of a system to define these sequences. We further show that our approach is better at detecting suspicious activities than other approaches, specifically those that use unigrams and bigrams for anomaly detection. We show this on a simulation of a large scale system based on Amazon Web application style architecture.