Proceedings of the 2017 ACM on Conference on Information and Knowledge Management最新文献_第8页

Profiling DRDoS Attacks with Data Analytics Pipeline 用数据分析管道分析ddos攻击

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133155

Laure Berti-Équille, Yury Zhauniarovich

{"title":"Profiling DRDoS Attacks with Data Analytics Pipeline","authors":"Laure Berti-Équille, Yury Zhauniarovich","doi":"10.1145/3132847.3133155","DOIUrl":"https://doi.org/10.1145/3132847.3133155","url":null,"abstract":"A large amount of Distributed Reflective Denial-of-Service (DRDoS) attacks are launched every day, and our understanding of the modus operandi of their perpetrators is yet very limited as we are submerged with so Big Data to analyze and do not have reliable and complete ways to validate our findings. In this paper, we propose a first analytic pipeline that enables us to cluster and characterize attack campaigns into several main profiles that exhibit similarities. These similarities are due to common technical properties of the underlying infrastructures used to launch these attacks. Although we do not have access to the ground truth and we do not know how many perpetrators are acting behind the scene, we can group their attacks based on relevant commonalities with cluster ensembling to estimate their number and capture their profiles over time. Specifically, our results show that we can repeatably identify and group together common profiles of attacks while considering domain expert's constraint in the cluster ensembles. From the obtained consensus clusters, we can generate comprehensive rules that characterize past campaigns and that can be used for classifying the next ones despite the evolving nature of the attacks. Such rules can be further used to filter out garbage traffic in Internet Service Provider networks.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86011857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

How Safe is Your (Taxi) Driver? 你的(出租车)司机安全吗?

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133068

R. Stanojevic

{"title":"How Safe is Your (Taxi) Driver?","authors":"R. Stanojevic","doi":"10.1145/3132847.3133068","DOIUrl":"https://doi.org/10.1145/3132847.3133068","url":null,"abstract":"For an auto insurer, understanding the risk of individual drivers is a critical factor in building a healthy and profitable portfolio. For decades, assessing the risk of drivers has relied on demographic information which allows the insurer to segment the market in several risk groups priced with an appropriate premium. In the recent years, however, some insurers started experimenting with so called Usage-Based Insurance (UBI) in which the insurer monitors a number of additional variables (mostly related to the location) and uses them to better assess the risk of the drivers. While several studies have reported results on the UBI trials these studies keep the studied data confidential (for obvious privacy and business concerns) which inevitably limits their reproducibility and interest by the data-mining community. In this paper we discuss a methodology for studying driver risk assessment using a public dataset of 173M taxi rides in NYC with over 40K drivers. Our approach for risk assessment utilizes not only the location data (which is significantly sparser than what is normally exploited in UBI) but also the revenue, tips and overall activity of the drivers (as proxies of their behavioral traits) and obtain risk scoring accuracy on par with the reported results on non-professional driver cohorts in spite of sparser location data and no demographic information about the drivers.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86090111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Ranking Rich Mobile Verticals based on Clicks and Abandonment 基于点击和放弃对富手机垂直市场进行排名

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133059

Mami Kawasaki, Inho Kang, T. Sakai

{"title":"Ranking Rich Mobile Verticals based on Clicks and Abandonment","authors":"Mami Kawasaki, Inho Kang, T. Sakai","doi":"10.1145/3132847.3133059","DOIUrl":"https://doi.org/10.1145/3132847.3133059","url":null,"abstract":"We consider the problem of ranking rich verticals, which we call \"cards,\" for a given mobile search query. Examples of card types include \"SHOP\" (showing access and contact information of a shop), \"WEATHER\" (showing a weather forecast for a particular location), and \"TV\" (showing information about a TV programme). These cards can be highly visual and/or concise, and may often satisfy the user's information need without making her click on them. While this \"good abandonment\" of the search engine result page is ideal especially for mobile environments where the interaction between the user and the search engine should be minimal, it poses a challenge for search engine companies whose ranking algorithms rely heavily on click data. In order to provide the right card types to the user for a given query, we propose a graph-based approach which extends a click-based automatic relevance estimation algorithm of Agrawal et al., by incorporating an abandonment-based preference rule. Using a real mobile query log from a commercial search engine, we constructed a data set containing 2,472 pairwise card type preferences covering 992 distinct queries, by hiring three independent assessors. Our proposed method outperforms a click-only baseline by 53-68% in terms of card type preference accuracy. The improvement is also statistically highly significant, with p ≈ 0.0000 according to the paired randomisation test.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84069596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Highly Efficient Mining of Overlapping Clusters in Signed Weighted Networks 签名加权网络中重叠簇的高效挖掘

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133004

Tuan-Anh Hoang, Ee-Peng Lim

{"title":"Highly Efficient Mining of Overlapping Clusters in Signed Weighted Networks","authors":"Tuan-Anh Hoang, Ee-Peng Lim","doi":"10.1145/3132847.3133004","DOIUrl":"https://doi.org/10.1145/3132847.3133004","url":null,"abstract":"In many practical contexts, networks are weighted as their links are assigned numerical weights representing relationship strengths or intensities of inter-node interaction. Moreover, the links' weight can be positive or negative, depending on the relationship or interaction between the connected nodes. The existing methods for network clustering however are not ideal for handling very large signed weighted networks. In this paper, we present a novel method called LPOCSIN (short for \"Linear Programming based Overlapping Clustering on Signed Weighted Networks\") for efficient mining of overlapping clusters in signed weighted networks. Different from existing methods that rely on computationally expensive cluster cohesiveness measures, LPOCSIN utilizes a simple yet effective one. Using this measure, we transform the cluster assignment problem into a series of alternating linear programs, and further propose a highly efficient procedure for solving those alternating problems. We evaluate LPOCSIN and other state-of-the-art methods by extensive experiments covering a wide range of synthetic and real networks. The experiments show that LPOCSIN significantly outperforms the other methods in recovering ground-truth clusters while being an order of magnitude faster than the most efficient state-of-the-art method.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84274203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Collecting Non-Geotagged Local Tweets via Bandit Algorithms 通过Bandit算法收集非地理标记的本地tweet

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133046

Saki Ueda, Yuto Yamaguchi, H. Kitagawa

引用次数: 1

Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model 基于卷积神经模型的医学文献检索临床笔记去噪

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133149

Luca Soldaini, Andrew Yates, Nazli Goharian

引用次数: 11

Relaxing Graph Pattern Matching With Explanations 放松图形模式匹配与解释

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132992

Jia Li, Yang Cao, Shuai Ma

{"title":"Relaxing Graph Pattern Matching With Explanations","authors":"Jia Li, Yang Cao, Shuai Ma","doi":"10.1145/3132847.3132992","DOIUrl":"https://doi.org/10.1145/3132847.3132992","url":null,"abstract":"Traditional graph pattern matching is based on subgraph isomorphism, which is often too restrictive to identify meaningful matches. To handle this, taxonomy subgraph isomorphism has been proposed to relax the label constraints in the matching. Nonetheless, there are many cases that cannot be covered. In this study, we first formalize taxonomy simulation, a natural matching semantics combing graph simulation with taxonomy, and propose its pattern relaxation to enrich graph pattern matching results with taxonomy information. We also design topological ranking and diversified topological ranking for top-k relaxations. We then study the top-k pattern relaxation problems, by providing their static analyses, and developing algorithms and optimization for finding and evaluating top-k pattern relaxations. We further propose a notion of explanations for answers to the relaxations and develop algorithms to compute explanations. These together give us a framework for enriching the results of graph pattern matching. Using real-life datasets, we experimentally verify that our framework and techniques are effective and efficient for identifying meaningful matches in practice.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84669568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

DeepHawkes: Bridging the Gap between Prediction and Understanding of Information Cascades DeepHawkes:弥合信息级联预测和理解之间的差距

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132973

Qi Cao, Huawei Shen, Keting Cen, W. Ouyang, Xueqi Cheng

{"title":"DeepHawkes: Bridging the Gap between Prediction and Understanding of Information Cascades","authors":"Qi Cao, Huawei Shen, Keting Cen, W. Ouyang, Xueqi Cheng","doi":"10.1145/3132847.3132973","DOIUrl":"https://doi.org/10.1145/3132847.3132973","url":null,"abstract":"Online social media remarkably facilitates the production and delivery of information, intensifying the competition among vast information for users' attention and highlighting the importance of predicting the popularity of information. Existing approaches for popularity prediction fall into two paradigms: feature-based approaches and generative approaches. Feature-based approaches extract various features (e.g., user, content, structural, and temporal features), and predict the future popularity of information by training a regression/classification model. Their predictive performance heavily depends on the quality of hand-crafted features. In contrast, generative approaches devote to characterizing and modeling the process that a piece of information accrues attentions, offering us high ease to understand the underlying mechanisms governing the popularity dynamics of information cascades. But they have less desirable predictive power since they are not optimized for popularity prediction. In this paper, we propose DeepHawkes to combat the defects of existing methods, leveraging end-to-end deep learning to make an analogy to interpretable factors of Hawkes process --- a widely-used generative process to model information cascade. DeepHawkes inherits the high interpretability of Hawkes process and possesses the high predictive power of deep learning methods, bridging the gap between prediction and understanding of information cascades. We verify the effectiveness of DeepHawkes by applying it to predict retweet cascades of Sina Weibo and citation cascades of a longitudinal citation dataset. Experimental results demonstrate that DeepHawkes outperforms both feature-based and generative approaches.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88146924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 182

PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization PQBF:基于积量化的I/ o高效近似近邻搜索

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132901

Yingfan Liu, Hong Cheng, Jiangtao Cui

{"title":"PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization","authors":"Yingfan Liu, Hong Cheng, Jiangtao Cui","doi":"10.1145/3132847.3132901","DOIUrl":"https://doi.org/10.1145/3132847.3132901","url":null,"abstract":"Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in many multimedia applications. Recently, product quantization (PQ) based methods for ANN search have attracted enormous attention in the community of computer vision, due to its good balance between accuracy and space requirement. PQ based methods embed a high-dimensional vector into a short binary code (called PQ code), and the squared Euclidean distance is estimated by asymmetric quantizer distance (AQD) with pretty high precision. Thus, ANN search in the original space can be converted to similarity search on AQD using the PQ approach. All existing PQ methods are in-memory solutions, which may not handle massive data if they cannot fit entirely in memory. In this paper, we propose an I/O-efficient PQ based solution for ANN search. We design an index called PQB+-forest to support efficient similarity search on AQD. PQB+-forest first creates a number of partitions of the PQ codes by a coarse quantizer and then builds a B+-tree, called PQB+-tree, for each partition. The search process is greatly expedited by focusing on a few selected partitions that are closest to the query, as well as by the pruning power of PQB+-trees. According to the experiments conducted on two large-scale data sets containing up to 1 billion vectors, our method outperforms its competitors, including the state-of-the-art PQ method and the state-of-the-art LSH methods for ANN search.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87402271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

J-REED: Joint Relation Extraction and Entity Disambiguation J-REED:联合关系提取与实体消歧

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133090

Dat Ba Nguyen, M. Theobald, G. Weikum

引用次数: 10