Data Preparation for User Profiling from Traffic Log

Marek Kumpost
{"title":"Data Preparation for User Profiling from Traffic Log","authors":"Marek Kumpost","doi":"10.1109/SECUREWARE.2007.4385316","DOIUrl":null,"url":null,"abstract":"This paper presents our current work on traffic log processing. Our goal is to find an approach to modeling user behaviour based on their behavioural patterns. Since the amount of input data we have is really large, effective preprocessing is crucial for the profiling to provide significant results. This paper presents our approach to restricting the input data with respect to its relevance. We use histogram clustering to identify sets of users with similar frequencies of communication; entropy and TF-IDF (term frequency - inverse document frequency) help to select destinations that are relevant for a given set of users. The main profiling is done with preprocessed data and our experiments show that this approach to restricting the input has a positive impact on the significance of results.","PeriodicalId":257937,"journal":{"name":"The International Conference on Emerging Security Information, Systems, and Technologies (SECUREWARE 2007)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Conference on Emerging Security Information, Systems, and Technologies (SECUREWARE 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SECUREWARE.2007.4385316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

This paper presents our current work on traffic log processing. Our goal is to find an approach to modeling user behaviour based on their behavioural patterns. Since the amount of input data we have is really large, effective preprocessing is crucial for the profiling to provide significant results. This paper presents our approach to restricting the input data with respect to its relevance. We use histogram clustering to identify sets of users with similar frequencies of communication; entropy and TF-IDF (term frequency - inverse document frequency) help to select destinations that are relevant for a given set of users. The main profiling is done with preprocessed data and our experiments show that this approach to restricting the input has a positive impact on the significance of results.
根据流量日志进行用户分析的数据准备
本文介绍了我们目前在交通日志处理方面的工作。我们的目标是找到一种基于用户行为模式来建模用户行为的方法。由于我们拥有的输入数据量非常大,因此有效的预处理对于分析提供重要的结果至关重要。本文介绍了我们的方法来限制输入数据的相关性。我们使用直方图聚类来识别具有相似通信频率的用户集;熵和TF-IDF(术语频率-逆文档频率)有助于选择与给定用户集相关的目标。主要的分析是用预处理的数据完成的,我们的实验表明,这种限制输入的方法对结果的重要性有积极的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信