{"title":"Correlating network events and transferring labels in the presence of IP address anonymisation","authors":"Sebastian Abt, Harald Baier","doi":"10.1109/CNSM.2016.7818401","DOIUrl":null,"url":null,"abstract":"The availability of labelled data, i.e. ground-truth or reference data, is typically a requirement for performing network research, especially for network security research. Labelled data, however, are sparsely available. Data sets present in repositories such as CAIDA or PREDICT are mostly missing labels and have IP addresses anonymised. Especially the latter compounds correlating these data sets with third-party information in order to assign labels a posteriori. To address this problem, we propose a scheme to anonymise IP addresses such that later correlation is still possible, without compromising security of either data sponsoring entity. The scheme we propose is based on Crypto-PAn [1] and is able to correlate events using anonymised IP addresses as correlation keys, without restricting choice of the cryptographic secret.","PeriodicalId":334604,"journal":{"name":"2016 12th International Conference on Network and Service Management (CNSM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Network and Service Management (CNSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CNSM.2016.7818401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The availability of labelled data, i.e. ground-truth or reference data, is typically a requirement for performing network research, especially for network security research. Labelled data, however, are sparsely available. Data sets present in repositories such as CAIDA or PREDICT are mostly missing labels and have IP addresses anonymised. Especially the latter compounds correlating these data sets with third-party information in order to assign labels a posteriori. To address this problem, we propose a scheme to anonymise IP addresses such that later correlation is still possible, without compromising security of either data sponsoring entity. The scheme we propose is based on Crypto-PAn [1] and is able to correlate events using anonymised IP addresses as correlation keys, without restricting choice of the cryptographic secret.