{"title":"Characterizing commercial intent","authors":"Azin Ashkan, C. Clarke","doi":"10.1145/1645953.1645965","DOIUrl":"https://doi.org/10.1145/1645953.1645965","url":null,"abstract":"Understanding the intent underlying user's queries may help personalize search results and therefore improve user satisfaction. We develop a methodology for using the content of search engine result pages (SERPs) along with the information obtained from query strings to study characteristics of query intent, with a particular focus on sponsored search. This work represents an initial step towards the development and evaluation of an ontology for commercial search, considering queries that reference specific products, brands and retailers. The characteristics of query categories are studied with respect to aggregated user's clickthrough behavior on advertising links. We present a model for clickthrough behavior that considers the influence of such factors as the location of ads and the rank of ads, along with query category. We evaluate our work using a large corpus of clickthrough data obtained from a major commercial search engine. Our findings suggest that query based features, along with the content of SERPs, are effective in detecting query intent. The clickthrough behavior is found to be consistent with the classification for the general categories of query intent, while for product, brand and retailer categories, all is true to a lesser extent.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133785998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of methods for relative comparison of retrieval systems based on clickthroughs","authors":"Jing He, ChengXiang Zhai, Xiaoming Li","doi":"10.1145/1645953.1646293","DOIUrl":"https://doi.org/10.1145/1645953.1646293","url":null,"abstract":"The Cranfield evaluation method has some disadvantages, including its high cost in labor and inadequacy for evaluating interactive retrieval techniques. As a very promising alternative, automatic comparison of retrieval systems based on observed clicking behavior of users has recently been studied. Several methods have been proposed, but there has so far been no systematic way to assess which strategy is better, making it difficult to choose a good method for real applications. In this paper, we propose a general way to evaluate these relative comparison methods with two measures: utility to users(UtU) and effectiveness of differentiation(EoD). We evaluate two state of the art methods by systematically simulating different retrieval scenarios. Inspired by the weakness of these methods revealed through our evaluation, we further propose a novel method by considering the positions of clicked documents. Experiment results show that our new method performs better than the existing methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115586747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive web mining of bilingual lexicons for cross language information retrieval","authors":"Lei Shi","doi":"10.1145/1645953.1646172","DOIUrl":"https://doi.org/10.1145/1645953.1646172","url":null,"abstract":"Bilingual web pages contain abundant term translation knowledge which is crucial for query translation in Cross Language Information Retrieval systems. But it is a challenging task to extract term translations from bilingual web pages due to the variation in web page layouts and writing styles. In this paper, based on the observation that translation pairs on the same web page tend to appear following similar patterns, a new extraction model is proposed to adaptively learn extraction patterns and exploit them to facilitate term translation mining from bilingual web pages. Experiments reflect that this model can significantly improve extraction coverage while maintaining high accuracy. It improves query translation in cross-language information retrieval, leading to significantly higher retrieval effectiveness on TREC collections.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114138492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A word clustering approach for language model-based sentence retrieval in question answering systems","authors":"S. Momtazi, D. Klakow","doi":"10.1145/1645953.1646263","DOIUrl":"https://doi.org/10.1145/1645953.1646263","url":null,"abstract":"In this paper we propose a term clustering approach to improve the performance of sentence retrieval in Question Answering (QA) systems. As the search in question answering is conducted over smaller segments of data than in a document retrieval task, the problems of data sparsity and exact matching become more critical. In this paper we propose Language Modeling (LM) techniques to overcome such problems and improve the sentence retrieval performance. Our proposed methods include building class-based models by term clustering, and then employing higher order n-grams with the new class-based model. We report our experiments on the TREC 2007 questions from QA track. The results show that the methods investigated here enhanced the mean average precision of sentence retrieval from 23.62% to 29.91%.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114648134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Sun, Yunbo Cao, Xinying Song, Young-In Song, Xiaolong Wang, Chin-Yew Lin
{"title":"Learning to recommend questions based on user ratings","authors":"Ke Sun, Yunbo Cao, Xinying Song, Young-In Song, Xiaolong Wang, Chin-Yew Lin","doi":"10.1145/1645953.1646049","DOIUrl":"https://doi.org/10.1145/1645953.1646049","url":null,"abstract":"At community question answering services, users are usually encouraged to rate questions by votes. The questions with the most votes are then recommended and ranked on the top when users browse questions by category. As users are not obligated to rate questions, usually only a small proportion of questions eventually gets rating. Thus, in this paper, we are concerned with learning to recommend questions from user ratings of a limited size. To overcome the data sparsity, we propose to utilize questions without users rating as well. Further, as there exist certain noises within user ratings (the preference of some users expressed in their ratings diverges from that of the majority of users), we design a new algorithm called 'majority-based perceptron algorithm' which can avoid the influence of noisy instances by emphasizing its learning over data instances from the majority users. Experimental results from a large collection of real questions confirm the effectiveness of our proposals.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116668696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Voting in social networks","authors":"P. Boldi, F. Bonchi, C. Castillo, S. Vigna","doi":"10.1145/1645953.1646052","DOIUrl":"https://doi.org/10.1145/1645953.1646052","url":null,"abstract":"A voting system is a set of rules that a community adopts to take collective decisions. In this paper we study voting systems for a particular kind of community: electronically mediated social networks. In particular, we focus on delegative democracy (a.k.a. proxy voting) that has recently received increased interest for its ability to combine the benefits of direct and representative systems, and that seems also perfectly suited for electronically mediated social networks. In such a context, we consider a voting system in which users can only express their preference for one among the people they are explicitly connected with, and this preference can be propagated transitively, using an attenuation factor. We present this system and we study its properties. We also take into consideration the problem of missing votes, which is particularly relevant in online networks, as some recent case shows. Our experiments on real-world networks provide interesting insight into the significance and stability of the results obtained with the suggested voting system.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117124723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vetting the links of the web","authors":"Na Dai, Brian D. Davison","doi":"10.1145/1645953.1646220","DOIUrl":"https://doi.org/10.1145/1645953.1646220","url":null,"abstract":"Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, well-known directories such as the dmoz Open Directory Project, which maintains links to representative and authoritative external web pages within their various topics. Therefore, such sites involve many editors to manually revisit and revise links that have become out-of-date. To remedy this situation, we propose the novel web mining task of identifying outdated links on the web. We build a general classification model, primarily using local and global temporal features extracted from historical content, topic, link and time-focused changes over time. We evaluate our system via five-fold cross-validation on more than fifteen thousand ODP external links selected from thirteen top-level categories. Our system can predict the actions of ODP editors more than 75% of the time. Our models and predictions could be useful for various applications that depend on analysis of web links, including ranking and crawling.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115752781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting bidirectional links: making spamming detection easier","authors":"Yan Zhang, Qiancheng Jiang, Lei Zhang, Yizhen Zhu","doi":"10.1145/1645953.1646244","DOIUrl":"https://doi.org/10.1145/1645953.1646244","url":null,"abstract":"Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propose two page value metrics, AVRank and HVRank. These two \"values\" of all the web pages can be well assessed by using the bidirectional links' information. Moreover, with the help of bidirectional links, it becomes easier to enlarge the propagation coverage of seed sets. We further discuss the effectiveness of the combination of these two metrics, such as the quadratic mean of them. Our experimental results show that with such two metrics, our method can filter out spam sites and identify reputable ones more effectively than previous algorithms such as TrustRank.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RS-Wrapper: random write optimization for solid state drive","authors":"Da Zhou, Xiaofeng Meng","doi":"10.1145/1645953.1646144","DOIUrl":"https://doi.org/10.1145/1645953.1646144","url":null,"abstract":"Solid State Drive (SSD), emerging as new data storage media with high random read speed, has been widely used in laptops, desktops, and data servers to replace hard disk during the past few years. However, poor random write performance becomes the bottle neck in practice. In this paper, we propose to insert unmodified data into random write sequence in order to convert random writes into sequential writes, and thus data sequence can be flushed at the speed of sequential write. Further, we propose a clustering strategy to improve the performance by reducing quantity of unmodified data to read. After exploring the intrinsic parallelism of SSD, we also propose to flush write sequences with the help of the simultaneous program between planes and parallel program between devices for the first time. Comprehensive experiments show that our method outperform the existing random-write solution up to one order of magnitude improvement.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122174037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Zhu, Weizhu Chen, Tao Wan, Chenguang Zhu, Gang Wang, Zheng Chen
{"title":"To divide and conquer search ranking by learning query difficulty","authors":"Z. Zhu, Weizhu Chen, Tao Wan, Chenguang Zhu, Gang Wang, Zheng Chen","doi":"10.1145/1645953.1646255","DOIUrl":"https://doi.org/10.1145/1645953.1646255","url":null,"abstract":"Learning to rank plays an important role in information retrieval. In most of the existing solutions for learning to rank, all the queries with their returned search results are learnt and ranked with a single model. In this paper, we demonstrate that it is highly beneficial to divide queries into multiple groups and conquer search ranking based on query difficulty. To this end, we propose a method which first characterizes a query using a variety of features extracted from user search behavior, such as the click entropy, the query reformulation probability. Next, a classification model is built on these extracted features to assign a score to represent how difficult a query is. Based on this score, our method automatically divides queries into groups, and trains a specific ranking model for each group to conquer search ranking. Experimental results on RankSVM and RankNet with a large-scale evaluation dataset show that the proposed method can achieve significant improvement in the task of web search ranking.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121477023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}