Proceedings of the 21st ACM international conference on Information and knowledge management最新文献_第5页

Graph classification: a diversified discriminative feature selection approach 图分类:一种多样化的判别特征选择方法

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2396791

Yuanyuan Zhu, J. Yu, Hong Cheng, Lu Qin

{"title":"Graph classification: a diversified discriminative feature selection approach","authors":"Yuanyuan Zhu, J. Yu, Hong Cheng, Lu Qin","doi":"10.1145/2396761.2396791","DOIUrl":"https://doi.org/10.1145/2396761.2396791","url":null,"abstract":"A graph models complex structural relationships among objects, and has been prevalently used in a wide range of applications. Building an automated graph classification model becomes very important for predicting unknown graphs or understanding complex structures between different classes. The graph classification framework being widely used consists of two steps, namely, feature selection and classification. The key issue is how to select important subgraph features from a graph database with a large number of graphs including positive graphs and negative graphs. Given the features selected, a generic classification approach can be used to build a classification model. In this paper, we focus on feature selection. We identify two main issues with the most widely used feature selection approach which is based on a discriminative score to select frequent subgraph features, and introduce a new diversified discriminative score to select features that have a higher diversity. We analyze the properties of the newly proposed diversified discriminative score, and conducted extensive performance studies to demonstrate that such a diversified discriminative score makes positive/negative graphs separable and leads to a higher classification accuracy.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134235433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Map to humans and reduce error: crowdsourcing for deduplication applied to digital libraries 面向人类，减少错误:将重复数据删除的众包应用于数字图书馆

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398554

Mihai Georgescu, Dang Duc Pham, C. S. Firan, W. Nejdl, Julien Gaugaz

{"title":"Map to humans and reduce error: crowdsourcing for deduplication applied to digital libraries","authors":"Mihai Georgescu, Dang Duc Pham, C. S. Firan, W. Nejdl, Julien Gaugaz","doi":"10.1145/2396761.2398554","DOIUrl":"https://doi.org/10.1145/2396761.2398554","url":null,"abstract":"Detecting duplicate entities, usually by examining metadata, has been the focus of much recent work. Several methods try to identify duplicate entities, while focusing either on accuracy or on efficiency and speed - with still no perfect solution. We propose a combined layered approach for duplicate detection with the main advantage of using Crowdsourcing as a training and feedback mechanism. By using Active Learning techniques on human provided examples, we fine tune our algorithm toward better duplicate detection accuracy. We keep the training cost low by gathering training data on demand for borderline cases or for inconclusive assessments. We apply our simple and powerful methods to an online publication search system: First, we perform a coarse duplicate detection relying on publication signatures in real time. Then, a second automatic step compares duplicate candidates and increases accuracy while adjusting based on both feedback from our online users and from Crowdsourcing platforms. Our approach shows an improvement of 14% over the untrained setting and is at only 4% difference to the human assessors in accuracy.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134549942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Continuous top-k query for graph streams 图流的连续top-k查询

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398717

Shirui Pan, Xingquan Zhu

{"title":"Continuous top-k query for graph streams","authors":"Shirui Pan, Xingquan Zhu","doi":"10.1145/2396761.2398717","DOIUrl":"https://doi.org/10.1145/2396761.2398717","url":null,"abstract":"In this paper, we propose to query correlated graphs in a data stream scenario, where an algorithm is required to retrieve the top k graphs which are mostly correlated to a query graph q. Due to the dynamic changing nature of the stream data and the inherent complexity of the graph query process, treating graph streams as static datasets is computationally infeasible or ineffective. In the paper, we propose a novel algorithm, Hoe-PGPL, to identify top-k correlated graphs from data stream, by using a sliding window which covers a number of consecutive batches of stream data records. Our theme is to employ Hoeffding bound to discover some potential candidates and use two level candidate checking (one corresponding to the whole sliding window level and one corresponding to the local data batch level) to accurately estimate the correlation of the emerging candidate patterns, without rechecking the historical stream data. Experimental results demonstrate that the proposed algorithm not only achieves good performance in terms of query precision and recall, but also is several times, or even an order of magnitude, more efficient than the straightforward algorithm with respect to the time and the memory consumption. Our method represents the first research endeavor for data stream based top-k correlated graph query.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"142 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133970579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A comprehensive analysis of parameter settings for novelty-biased cumulative gain 对新颖性偏置累积增益参数设置的综合分析

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398550

Teerapong Leelanupab, G. Zuccon, J. Jose

引用次数: 9

User engagement: the network effect matters! 用户粘性:网络效应很重要!

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2396763

R. Baeza-Yates, M. Lalmas

{"title":"User engagement: the network effect matters!","authors":"R. Baeza-Yates, M. Lalmas","doi":"10.1145/2396761.2396763","DOIUrl":"https://doi.org/10.1145/2396761.2396763","url":null,"abstract":"In the online world, user engagement refers to the quality of the user experience that emphasizes the positive aspects of the interaction with a web application and, in particular, the phenomena associated with wanting to use that application longer and frequently. This definition is motivated by the observation that successful web applications are not just used, but they are engaged with. Users invest time, attention, and emotion into them. Online providers aim not only to engage users with each service, but across all services in their network. They spend increasing effort to direct users to various services (e.g.~using hyperlinks to help users navigate to and explore other services), to increase user traffic between their services. Nothing is known for users engaging across such a network of Web sites, something we call networked user engagement. We address this problem by combining techniques from web analytics and mining, information retrieval evaluation, and existing works on user engagement coming from the domains of information science, multimodal human computer interaction and cognitive psychology. In this way, we can combine insights from big data with deep analysis of human behavior in the lab or through crowd-sourcing experiments.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131620022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

WiSeNet: building a wikipedia-based semantic network with ontologized relations WiSeNet:建立一个基于维基百科的语义网络，具有本体关系

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398495

A. Moro, Roberto Navigli

引用次数: 36

A unified optimization framework for auction and guaranteed delivery in online advertising 一个统一的在线广告拍卖和保证投放的优化框架

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398561

Konstantin Salomatin, Tie-Yan Liu, Yiming Yang

{"title":"A unified optimization framework for auction and guaranteed delivery in online advertising","authors":"Konstantin Salomatin, Tie-Yan Liu, Yiming Yang","doi":"10.1145/2396761.2398561","DOIUrl":"https://doi.org/10.1145/2396761.2398561","url":null,"abstract":"This paper proposes a new unified optimization framework combining pay-per-click auctions and guaranteed delivery in sponsored search. Advertisers usually have different (and sometimes mixed) marketing goals: brand awareness and direct response. Different mechanisms are good at addressing different goals, e.g., guaranteed delivery was often used to build brand awareness and pay-per-click auctions was widely used for direct marketing. Our new method accommodates both in a unified framework, with the search engine revenue as an optimization objective. In this way, we can target a guaranteed number of ad clicks (or impressions) per campaign for advertisers willing to pay a premium and enable keyword auctions for all others. Specifically, we formulate this joint optimization problem using linear programming and a column generation strategy for efficiency. To select the best column (a ranked list of ads) given a query, we propose a novel dynamic programming algorithm that takes the special structure of the ad allocation and pricing mechanisms into account. We have tested the proposed framework and the algorithms on real ad data obtained from a commercial search engine. The results demonstrate that our proposed approach can outperform several baselines in guaranteeing the number of clicks for the given advertisers, and in increasing the total revenue for the search engine.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133045083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Estimating interleaved comparison outcomes from historical click data 估计历史点击数据的交错比较结果

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398516

Katja Hofmann, Shimon Whiteson, M. de Rijke

引用次数: 31

Being picky: processing top-k queries with set-defined selections 挑剔:用集合定义的选择处理top-k查询

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2396877

A. Stupar, S. Michel

引用次数: 1

Fast approximation of steiner trees in large graphs 大图中steiner树的快速逼近

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398460

Andrey Gubichev, Thomas Neumann

引用次数: 22