{"title":"A scalable and effective full-text search in P2P networks","authors":"Y. Mass, Y. Sagiv, Michal Shmueli-Scheuer","doi":"10.1145/1645953.1646281","DOIUrl":"https://doi.org/10.1145/1645953.1646281","url":null,"abstract":"We consider the problem of full-text search involving multi-term queries in a network of self-organizing, autonomous peers. Existing approaches do not scale well with respect to the number of peers, because they either require access to a large number of peers or incur a high communication cost in order to achieve good query results. In this paper, we present a novel algorithmic framework for processing multi-term queries in P2P networks that achieves high recall while using (per-query) a small number of peers and a low communication cost, thereby enabling high query throughput. Our approach is based on per-query peer-selection strategy using two-dimensional histograms of score distributions. A full utilization of the histograms incurs a high communication cost. We show how to drastically reduce this cost by employing a two-phase peer-selection algorithm. We also describe an adaptive approach to peer selection that further increases the recall. Experiments on a large real-world collection show that the recall is indeed high while the number of involved peers and the communication cost are low.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124173501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Ji, Jun Yan, Ning Liu, Wen Zhang, Weiguo Fan, Zheng Chen
{"title":"ExSearch: a novel vertical search engine for online barter business","authors":"Lei Ji, Jun Yan, Ning Liu, Wen Zhang, Weiguo Fan, Zheng Chen","doi":"10.1145/1645953.1646125","DOIUrl":"https://doi.org/10.1145/1645953.1646125","url":null,"abstract":"E-Commerce has shown its exponentially-growing business value in the past decade. However, in contrast to the successful examples in online sales, such as Amazon1 and eBay2, the online barter business is still underexplored due to the lack of corresponding information aggregation service. In this paper, we design and implement a novel vertical search engine, called ExSearch, to aggregate online barter information for developing the barter market. Different from classical general purpose Web search engines, ExSearch adopts a focused crawler to gather related information from various websites. We propose to automatically extract the barter information from free-text Web pages such that the unstructured information is represented in structured databases. In addition, we utilize the data mining techniques such as regression to fulfill the missing information, which cannot be extracted from the Web pages. Finally, we validate and rank the search results according to user queries. Experimental results show that each component module in our proposed ExSearch system is efficient and effective. The volunteer users are satisfied by and interested in this novel vertical search engine.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125344336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using domain ontology for semantic web usage mining and next page prediction","authors":"Nizar R. Mabroukeh, C. Ezeife","doi":"10.1145/1645953.1646202","DOIUrl":"https://doi.org/10.1145/1645953.1646202","url":null,"abstract":"This paper proposes the integration of semantic information drawn from a web application's domain knowledge into all phases of the web usage mining process (preprocessing, pattern discovery, and recommendation/prediction). The goal is to have an intelligent semantics-aware web usage mining framework. This is accomplished by using semantic information in the sequential pattern mining algorithm to prune the search space and partially relieve the algorithm from support counting. In addition, semantic information is used in the prediction phase with low order Markov models, for less space complexity and accurate prediction, that will help ambiguous predictions problem. Experimental results show that semantics-aware sequential pattern mining algorithms can perform 4 times faster than regular non-semantics-aware algorithms with only 26% of the memory requirement.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124666241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pseudo relevance feedback using semantic clustering in relevance language model","authors":"Qiang Pu, Daqing He","doi":"10.1145/1645953.1646268","DOIUrl":"https://doi.org/10.1145/1645953.1646268","url":null,"abstract":"Pseudo relevance feedback has demonstrated to be in general an effective technique for improving retrieval effectiveness, but the noise in the top retrieved documents still can cause topic drift problem that affects the performance of certain topics. By viewing a document as an interaction of a set of independent hidden topics, we propose a novel semantic clustering technique using independent component analysis. Then within the language modeling framework, we apply the obtained semantic topic clusters into the query sampling process so that the sampling depends on the activated topics rather than on the individual document language model. Therefore, we obtain a semantic cluster based relevance language model, which uses pseudo relevance feedback technique without requiring any relevance training information. We applied the model on five TREC data sets. The experiments show that our model can significantly improve retrieval performance over traditional language models including relevance-based and clustering-based retrieval language models. The main contribution of the improvements comes from the estimation of the relevance model on the semantic clusters that are closely related to the query.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128862035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Who tags the tags?: a framework for bookmark weighting","authors":"David Carmel, Haggai Roitman, E. Yom-Tov","doi":"10.1145/1645953.1646176","DOIUrl":"https://doi.org/10.1145/1645953.1646176","url":null,"abstract":"In this work we propose a novel framework for bookmark weighting which allows us to estimate the effectiveness of each of the bookmarks individually. We show that by weighting bookmarks according to their estimated quality we can significantly improve search effectiveness. Using empirical evaluation on real data gathered from two large bookmarking systems, we demonstrate the effectiveness of the new framework for search enhancement.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"92 25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128865191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demonstration of an RFID middleware: LIT ALE manager","authors":"Qiang Wang, W. Ryu, Soohan Kim, B. Hong","doi":"10.1145/1645953.1646306","DOIUrl":"https://doi.org/10.1145/1645953.1646306","url":null,"abstract":"The sharp increase of RFID tags and readers require dedicated middleware solutions that manage readers and process event data. In this paper, we demonstrate a RFID middleware called LIT ALE Manager system with key features and also illustrate overall functional components of the middleware system with implementation techniques.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121645305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blog cascade affinity: analysis and prediction","authors":"Hui Li, S. Bhowmick, Aixin Sun","doi":"10.1145/1645953.1646095","DOIUrl":"https://doi.org/10.1145/1645953.1646095","url":null,"abstract":"Information propagation within the blogosphere is of much importance in implementing policies, marketing research, launching new products, and other applications. In this paper, we take a microscopic view of the information propagation pattern in blogosphere by investigating blog cascade affinity. A blog cascade is a group of posts linked together discussing about the same topic, and cascade affnity refers to the phenomenon of a blog's inclination to join a specific cascade. We identify and analyze an array of features that may affect a blogger's cascade joining behavior and utilize these features to predict cascade affinity of blogs. Evaluated on a real dataset consisting of 873,496 posts, our svm-based prediction achieved accuracy of 0.723 measured by F1. Our experiments also showed that among all features identified, the number of friends was the most important factor affecting bloggers' inclination to join cascades.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121923944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Becker, A. Broder, E. Gabrilovich, V. Josifovski, B. Pang
{"title":"What happens after an ad click?: quantifying the impact of landing pages in web advertising","authors":"H. Becker, A. Broder, E. Gabrilovich, V. Josifovski, B. Pang","doi":"10.1145/1645953.1645964","DOIUrl":"https://doi.org/10.1145/1645953.1645964","url":null,"abstract":"Unbeknownst to most users, when a query is submitted to a search engine two distinct searches are performed: the organic or algorithmic search that returns relevant Web pages and related data (maps, images, etc.), and the sponsored search that returns paid advertisements. While an enormous amount of work has been invested in understanding the user interaction with organic search, surprisingly little research has been dedicated to what happens after an ad is clicked, a situation we aim to correct. To this end, we define and study the process of context transfer, that is, the user's transition from Web search to the context of the landing page that follows an ad-click. We conclude that in the vast majority of cases the user is shown one of three types of pages, namely, Homepage (the homepage of the advertiser), Category browse (a browse-able sub-catalog related to the original query), and Search transfer (the search results of the same query re-executed on the target site). We show that these three types of landing pages can be accurately distinguished using automatic text classification. Finally, using such an automatic classifier, we correlate the landing page type with conversion data provided by advertisers, and show that the conversion rate (i.e., users' response rate to ads) varies considerably according to the type. We believe our findings will further the understanding of users' response to search advertising in general, and landing pages in particular, and thus help advertisers improve their Web sites and help search engines select the most suitable ads.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125483310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative study of methods for estimating query language models with pseudo feedback","authors":"Yuanhua Lv, ChengXiang Zhai","doi":"10.1145/1645953.1646259","DOIUrl":"https://doi.org/10.1145/1645953.1646259","url":null,"abstract":"We systematically compare five representative state-of-the-art methods for estimating query language models with pseudo feedback in ad hoc information retrieval, including two variants of the relevance language model, two variants of the mixture feedback model, and the divergence minimization estimation method. Our experiment results show that a variant of relevance model and a variant of the mixture model tend to outperform other methods. We further propose several heuristics that are intuitively related to the good retrieval performance of an estimation method, and show that the variations in how these heuristics are implemented in different methods provide a good explanation of many empirical observations.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114898435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable learning of collective behavior based on sparse social dimensions","authors":"Lei Tang, Huan Liu","doi":"10.1145/1645953.1646094","DOIUrl":"https://doi.org/10.1145/1645953.1646094","url":null,"abstract":"The study of collective behavior is to understand how individuals behave in a social network environment. Oceans of data generated by social media like Facebook, Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension based approach is adopted to address the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands or even millions of actors. The scale of networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the social-dimension based approach can efficiently handle networks of millions of actors while demonstrating comparable prediction performance as other non-scalable methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117211627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}