Proceedings of the 2017 ACM on Conference on Information and Knowledge Management最新文献

筛选
英文 中文
GPU-Accelerated Graph Clustering via Parallel Label Propagation 基于并行标签传播的gpu加速图聚类
Yusuke Kozawa, T. Amagasa, H. Kitagawa
{"title":"GPU-Accelerated Graph Clustering via Parallel Label Propagation","authors":"Yusuke Kozawa, T. Amagasa, H. Kitagawa","doi":"10.1145/3132847.3132960","DOIUrl":"https://doi.org/10.1145/3132847.3132960","url":null,"abstract":"Graph clustering has recently attracted much attention as a technique to extract community structures from various kinds of graph data. Since available graph data becomes increasingly large, the acceleration of graph clustering is an important issue for handling large-scale graphs. To this end, this paper proposes a fast graph clustering method using GPUs. The proposed method is based on parallelization of label propagation, one of the fastest graph clustering algorithms. Our method has the following three characteristics: (1) efficient parallelization: the algorithm of label propagation is transformed into a sequence of data-parallel primitives; (2) load balance: the method takes into account load balancing by adopting the primitives that make the load among threads and blocks well balanced; and (3) out-of-core processing: we also develop algorithms to efficiently deal with large-scale datasets that do not fit into GPU memory. Moreover, this GPU out-of-core algorithm is extended to simultaneously exploit both CPUs and GPUs for further performance gain. Extensive experiments with real-world and synthetic datasets show that our proposed method outperforms an existing parallel CPU implementation by a factor of up to 14.3 without sacrificing accuracy.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88686283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Profiling DRDoS Attacks with Data Analytics Pipeline 用数据分析管道分析ddos攻击
Laure Berti-Équille, Yury Zhauniarovich
{"title":"Profiling DRDoS Attacks with Data Analytics Pipeline","authors":"Laure Berti-Équille, Yury Zhauniarovich","doi":"10.1145/3132847.3133155","DOIUrl":"https://doi.org/10.1145/3132847.3133155","url":null,"abstract":"A large amount of Distributed Reflective Denial-of-Service (DRDoS) attacks are launched every day, and our understanding of the modus operandi of their perpetrators is yet very limited as we are submerged with so Big Data to analyze and do not have reliable and complete ways to validate our findings. In this paper, we propose a first analytic pipeline that enables us to cluster and characterize attack campaigns into several main profiles that exhibit similarities. These similarities are due to common technical properties of the underlying infrastructures used to launch these attacks. Although we do not have access to the ground truth and we do not know how many perpetrators are acting behind the scene, we can group their attacks based on relevant commonalities with cluster ensembling to estimate their number and capture their profiles over time. Specifically, our results show that we can repeatably identify and group together common profiles of attacks while considering domain expert's constraint in the cluster ensembles. From the obtained consensus clusters, we can generate comprehensive rules that characterize past campaigns and that can be used for classifying the next ones despite the evolving nature of the attacks. Such rules can be further used to filter out garbage traffic in Internet Service Provider networks.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86011857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
How Safe is Your (Taxi) Driver? 你的(出租车)司机安全吗?
R. Stanojevic
{"title":"How Safe is Your (Taxi) Driver?","authors":"R. Stanojevic","doi":"10.1145/3132847.3133068","DOIUrl":"https://doi.org/10.1145/3132847.3133068","url":null,"abstract":"For an auto insurer, understanding the risk of individual drivers is a critical factor in building a healthy and profitable portfolio. For decades, assessing the risk of drivers has relied on demographic information which allows the insurer to segment the market in several risk groups priced with an appropriate premium. In the recent years, however, some insurers started experimenting with so called Usage-Based Insurance (UBI) in which the insurer monitors a number of additional variables (mostly related to the location) and uses them to better assess the risk of the drivers. While several studies have reported results on the UBI trials these studies keep the studied data confidential (for obvious privacy and business concerns) which inevitably limits their reproducibility and interest by the data-mining community. In this paper we discuss a methodology for studying driver risk assessment using a public dataset of 173M taxi rides in NYC with over 40K drivers. Our approach for risk assessment utilizes not only the location data (which is significantly sparser than what is normally exploited in UBI) but also the revenue, tips and overall activity of the drivers (as proxies of their behavioral traits) and obtain risk scoring accuracy on par with the reported results on non-professional driver cohorts in spite of sparser location data and no demographic information about the drivers.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86090111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Knowledge Embeddings by Combining Limit-based Scoring Loss 结合基于极限的评分损失学习知识嵌入
Xiaofei Zhou, Qiannan Zhu, Ping Liu, Li Guo
{"title":"Learning Knowledge Embeddings by Combining Limit-based Scoring Loss","authors":"Xiaofei Zhou, Qiannan Zhu, Ping Liu, Li Guo","doi":"10.1145/3132847.3132939","DOIUrl":"https://doi.org/10.1145/3132847.3132939","url":null,"abstract":"In knowledge graph embedding models, the margin-based ranking loss as the common loss function is usually used to encourage discrimination between golden triplets and incorrect triplets, which has proved effective in many translation-based models for knowledge graph embedding. However, we find that the loss function cannot ensure the fact that the scoring of correct triplets must be low enough to fulfill the translation. In this paper, we present a limit-based scoring loss to provide lower scoring of a golden triplet, and then to extend two basic translation models TransE and TransH, separately to TransE-RS and TransH-RS by combining limit-based scoring loss with margin-based ranking loss. Both the presented models have low complexities of parameters benefiting for application on large scale graphs. In experiments, we evaluate our models on two typical tasks including triplet classification and link prediction, and also analyze the scoring distributions of positive and negative triplets by different models. Experimental results show that the introduced limit-based scoring loss is effective to improve the capacities of knowledge graph embedding.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81828128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Relaxing Graph Pattern Matching With Explanations 放松图形模式匹配与解释
Jia Li, Yang Cao, Shuai Ma
{"title":"Relaxing Graph Pattern Matching With Explanations","authors":"Jia Li, Yang Cao, Shuai Ma","doi":"10.1145/3132847.3132992","DOIUrl":"https://doi.org/10.1145/3132847.3132992","url":null,"abstract":"Traditional graph pattern matching is based on subgraph isomorphism, which is often too restrictive to identify meaningful matches. To handle this, taxonomy subgraph isomorphism has been proposed to relax the label constraints in the matching. Nonetheless, there are many cases that cannot be covered. In this study, we first formalize taxonomy simulation, a natural matching semantics combing graph simulation with taxonomy, and propose its pattern relaxation to enrich graph pattern matching results with taxonomy information. We also design topological ranking and diversified topological ranking for top-k relaxations. We then study the top-k pattern relaxation problems, by providing their static analyses, and developing algorithms and optimization for finding and evaluating top-k pattern relaxations. We further propose a notion of explanations for answers to the relaxations and develop algorithms to compute explanations. These together give us a framework for enriching the results of graph pattern matching. Using real-life datasets, we experimentally verify that our framework and techniques are effective and efficient for identifying meaningful matches in practice.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84669568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Attentive Graph-based Recursive Neural Network for Collective Vertex Classification 基于关注图的聚点分类递归神经网络
Qiongkai Xu, Qing Wang, Chenchen Xu, Lizhen Qu
{"title":"Attentive Graph-based Recursive Neural Network for Collective Vertex Classification","authors":"Qiongkai Xu, Qing Wang, Chenchen Xu, Lizhen Qu","doi":"10.1145/3132847.3133081","DOIUrl":"https://doi.org/10.1145/3132847.3133081","url":null,"abstract":"Vertex classification is a critical task in graph analysis, where both contents and linkage of vertices are incorporated during classification. Recently, researchers proposed using deep neural network to build an end-to-end framework, which can capture both local content and structure information. These approaches were proved effective in incorporating semantic meanings of neighbouring vertices, while the usefulness of this information was not properly considered. In this paper, we propose an Attentive Graph-based Recursive Neural Network (AGRNN), which exerts attention on neural network to make our model focus on vertices with more relevant semantic information. We evaluated our approach on three real-world datasets and also datasets with synthetic noise. Our experimental results show that AGRNN achieves the state-of-the-art performance, in terms of effectiveness and robustness. We have also illustrated some attention weight samples to demonstrate the rationality of our model.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"135 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90622350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Ranking Rich Mobile Verticals based on Clicks and Abandonment 基于点击和放弃对富手机垂直市场进行排名
Mami Kawasaki, Inho Kang, T. Sakai
{"title":"Ranking Rich Mobile Verticals based on Clicks and Abandonment","authors":"Mami Kawasaki, Inho Kang, T. Sakai","doi":"10.1145/3132847.3133059","DOIUrl":"https://doi.org/10.1145/3132847.3133059","url":null,"abstract":"We consider the problem of ranking rich verticals, which we call \"cards,\" for a given mobile search query. Examples of card types include \"SHOP\" (showing access and contact information of a shop), \"WEATHER\" (showing a weather forecast for a particular location), and \"TV\" (showing information about a TV programme). These cards can be highly visual and/or concise, and may often satisfy the user's information need without making her click on them. While this \"good abandonment\" of the search engine result page is ideal especially for mobile environments where the interaction between the user and the search engine should be minimal, it poses a challenge for search engine companies whose ranking algorithms rely heavily on click data. In order to provide the right card types to the user for a given query, we propose a graph-based approach which extends a click-based automatic relevance estimation algorithm of Agrawal et al., by incorporating an abandonment-based preference rule. Using a real mobile query log from a commercial search engine, we constructed a data set containing 2,472 pairwise card type preferences covering 992 distinct queries, by hiring three independent assessors. Our proposed method outperforms a click-only baseline by 53-68% in terms of card type preference accuracy. The improvement is also statistically highly significant, with p ≈ 0.0000 according to the paired randomisation test.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84069596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model 基于卷积神经模型的医学文献检索临床笔记去噪
Luca Soldaini, Andrew Yates, Nazli Goharian
{"title":"Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model","authors":"Luca Soldaini, Andrew Yates, Nazli Goharian","doi":"10.1145/3132847.3133149","DOIUrl":"https://doi.org/10.1145/3132847.3133149","url":null,"abstract":"The rapid increase of medical literature poses a significant challenge for physicians, who have repeatedly reported to struggle to keep up to date with developments in research. This gap is one of the main challenges in integrating recent advances in clinical research with day-to-day practice. Thus, the need for clinical decision support (CDS) search systems that can retrieve highly relevant medical literature given a clinical note describing a patient has emerged. However, clinical notes are inherently noisy, thus not being fit to be used as queries as-is. In this work, we present a convolutional neural model aimed at improving clinical notes representation, making them suitable for document retrieval. The system is designed to predict, for each clinical note term, its importance in relevant documents. The approach was evaluated on the 2016 TREC CDS dataset, where it achieved a 37% improvement in infNDCG over state-of-the-art query reduction methods and a 27% improvement over the best known method for the task.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84244410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Highly Efficient Mining of Overlapping Clusters in Signed Weighted Networks 签名加权网络中重叠簇的高效挖掘
Tuan-Anh Hoang, Ee-Peng Lim
{"title":"Highly Efficient Mining of Overlapping Clusters in Signed Weighted Networks","authors":"Tuan-Anh Hoang, Ee-Peng Lim","doi":"10.1145/3132847.3133004","DOIUrl":"https://doi.org/10.1145/3132847.3133004","url":null,"abstract":"In many practical contexts, networks are weighted as their links are assigned numerical weights representing relationship strengths or intensities of inter-node interaction. Moreover, the links' weight can be positive or negative, depending on the relationship or interaction between the connected nodes. The existing methods for network clustering however are not ideal for handling very large signed weighted networks. In this paper, we present a novel method called LPOCSIN (short for \"Linear Programming based Overlapping Clustering on Signed Weighted Networks\") for efficient mining of overlapping clusters in signed weighted networks. Different from existing methods that rely on computationally expensive cluster cohesiveness measures, LPOCSIN utilizes a simple yet effective one. Using this measure, we transform the cluster assignment problem into a series of alternating linear programs, and further propose a highly efficient procedure for solving those alternating problems. We evaluate LPOCSIN and other state-of-the-art methods by extensive experiments covering a wide range of synthetic and real networks. The experiments show that LPOCSIN significantly outperforms the other methods in recovering ground-truth clusters while being an order of magnitude faster than the most efficient state-of-the-art method.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84274203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DeepHawkes: Bridging the Gap between Prediction and Understanding of Information Cascades DeepHawkes:弥合信息级联预测和理解之间的差距
Qi Cao, Huawei Shen, Keting Cen, W. Ouyang, Xueqi Cheng
{"title":"DeepHawkes: Bridging the Gap between Prediction and Understanding of Information Cascades","authors":"Qi Cao, Huawei Shen, Keting Cen, W. Ouyang, Xueqi Cheng","doi":"10.1145/3132847.3132973","DOIUrl":"https://doi.org/10.1145/3132847.3132973","url":null,"abstract":"Online social media remarkably facilitates the production and delivery of information, intensifying the competition among vast information for users' attention and highlighting the importance of predicting the popularity of information. Existing approaches for popularity prediction fall into two paradigms: feature-based approaches and generative approaches. Feature-based approaches extract various features (e.g., user, content, structural, and temporal features), and predict the future popularity of information by training a regression/classification model. Their predictive performance heavily depends on the quality of hand-crafted features. In contrast, generative approaches devote to characterizing and modeling the process that a piece of information accrues attentions, offering us high ease to understand the underlying mechanisms governing the popularity dynamics of information cascades. But they have less desirable predictive power since they are not optimized for popularity prediction. In this paper, we propose DeepHawkes to combat the defects of existing methods, leveraging end-to-end deep learning to make an analogy to interpretable factors of Hawkes process --- a widely-used generative process to model information cascade. DeepHawkes inherits the high interpretability of Hawkes process and possesses the high predictive power of deep learning methods, bridging the gap between prediction and understanding of information cascades. We verify the effectiveness of DeepHawkes by applying it to predict retweet cascades of Sina Weibo and citation cascades of a longitudinal citation dataset. Experimental results demonstrate that DeepHawkes outperforms both feature-based and generative approaches.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88146924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 182
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信