{"title":"Group ranking with application to image retrieval","authors":"Ou Wu, Weiming Hu, Bing Li","doi":"10.1145/1871437.1871642","DOIUrl":"https://doi.org/10.1145/1871437.1871642","url":null,"abstract":"Many existing ranking-related information processing applications can be summarized into one theoretical problem called group ranking (GR). A simple average-ranking approach is usually applied to GR. Although the approach seems reasonable, no theoretical analysis about its intrinsic mechanism has been presented, increasing the difficulty of evaluating the ranking results. This study provides a formal analysis for GR. We first construct an objective function for the GR problem, and discover that each GR problem can be transformed into a rank aggregation problem whose objective function is proved to be equal to the objective function of GR. As a consequence, the average-ranking approach can be explained by two well-known rank aggregation techniques. We incorporate two other effective rank aggregation methods into the GR problem and obtain two new GR algorithms. We apply the GR algorithms into image retrieval to diversify the image search results returned by search engines. Experimental results show the effectiveness of the proposed GR algorithms.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121971330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Document update summarization using incremental hierarchical clustering","authors":"Dingding Wang, Tao Li","doi":"10.1145/1871437.1871476","DOIUrl":"https://doi.org/10.1145/1871437.1871476","url":null,"abstract":"Document summarization has become a hot topic in recent years. However, most of existing summarization methods work on a batch of documents and do not consider that documents may arrive in a sequence and the corresponding summaries need to be updated in real time. In this paper, we propose a new summarization method based on an incremental hierarchical clustering framework to update summaries as soon as a new document arrives. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed method.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"33 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120875121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning sentiment classification model from labeled features","authors":"Yulan He","doi":"10.1145/1871437.1871704","DOIUrl":"https://doi.org/10.1145/1871437.1871704","url":null,"abstract":"We propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon. Preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than exiting weakly-supervised sentiment classification methods despite using no labeled documents.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emine Yilmaz, Milad Shokouhi, Nick Craswell, S. Robertson
{"title":"Expected browsing utility for web search evaluation","authors":"Emine Yilmaz, Milad Shokouhi, Nick Craswell, S. Robertson","doi":"10.1145/1871437.1871672","DOIUrl":"https://doi.org/10.1145/1871437.1871672","url":null,"abstract":"Most information retrieval evaluation metrics are designed to measure the satisfaction of the user given the results returned by a search engine. In order to evaluate user satisfaction, most of these metrics have underlying user models, which aim at modeling how users interact with search engine results. Hence, the quality of an evaluation metric is a direct function of the quality of its underlying user model. This paper proposes EBU, a new evaluation metric that uses a sophisticated user model tuned by observations over many thousands of real search sessions. We compare EBU with a number of state of the art evaluation metrics and show that it is more correlated with real user behavior captured by clicks.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128223439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query optimization for ontology-based information integration","authors":"Yingjie Li, J. Heflin","doi":"10.1145/1871437.1871623","DOIUrl":"https://doi.org/10.1145/1871437.1871623","url":null,"abstract":"In recent years, there has been an explosion of publicly available RDF and OWL data sources. In order to effectively and quickly answer queries in such an environment, we present an approach to identifying the potentially relevant Semantic Web data sources using query rewritings and a term index. We demonstrate that such an approach must carefully handle query goals that lack constants; otherwise the algorithm may identify many sources that do not contribute to eventual answers. This is because the term index only indicates if URIs are present in a document, and specific answers to a subgoal cannot be calculated until the source is physically accessed - an expensive operation given disk/network latency. We present an algorithm that, given a set of query rewritings that accounts for ontology heterogeneity, incrementally selects and processes sources in order to maintain selectivity. Once sources are selected, we use an OWL reasoner to answer queries over these sources and their corresponding ontologies. We present the results of experiments using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127160249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online learning for multi-task feature selection","authors":"Haiqin Yang, Irwin King, Michael R. Lyu","doi":"10.1145/1871437.1871706","DOIUrl":"https://doi.org/10.1145/1871437.1871706","url":null,"abstract":"Multi-task feature selection (MTFS) is an important tool to learn the explanatory features across multiple related tasks. Previous MTFS methods fulfill this task in batch-mode training. This makes them inefficient when data come in sequence or when the number of training data is so large that they cannot be loaded into the memory simultaneously. To tackle these problems, we propose the first online learning framework for MTFS. A main advantage of the online algorithms is the efficiency in both time complexity and memory cost due to the closed-form solutions in updating the model weights at each iteration. Experimental results on a real-world dataset attest to the merits of the proposed algorithms.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132186126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On top-k social web search","authors":"Peifeng Yin, Wang-Chien Lee, Ken C. K. Lee","doi":"10.1145/1871437.1871609","DOIUrl":"https://doi.org/10.1145/1871437.1871609","url":null,"abstract":"To enhance the quality of document search, recent research studies have started to exploit the social networks of users by considering social influence (SI), measurement of the affinity between a query user and the publisher of a retrieved document, in addition to the commonly used textual relevance (TR). We refer to such document search that considers social networks as social web search. In this paper, we focus on efficient top-k social web search and propose two search strategies: (i) TR-based search and (ii) SI-based search that tailor document examination orders upon TR and SI, respectively. We evaluate the proposed strategies through experimentation.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130600242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active caching for similarity queries based on shared-neighbor information","authors":"M. Houle, Vincent Oria, U. Qasim","doi":"10.1145/1871437.1871524","DOIUrl":"https://doi.org/10.1145/1871437.1871524","url":null,"abstract":"Novel applications such as recommender systems, uncertain databases, and multimedia databases are designed to process similarity queries that produce ranked lists of objects as their results. Similarity queries typically result in disk access latency and incur a substantial computational cost. In this paper, we propose an 'active caching' technique for similarity queries that is capable of synthesizing query results from cached information even when the required result list is not explicitly stored in the cache. Our solution, the Cache Estimated Significance (CES) model, is based on shared-neighbor similarity measures, which assess the strength of the relationship between two objects as a function of the number of other objects in the common intersection of their neighborhoods. The proposed method is general in that it does not require that the features be drawn from a metric space, nor does it require that the partial orders induced by the similarity measure be monotonic. Experimental results on real data sets show a substantial cache hit rate when compared with traditional caching approaches.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130921783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal dynamics and information retrieval","authors":"S. Dumais","doi":"10.1145/1871437.1871442","DOIUrl":"https://doi.org/10.1145/1871437.1871442","url":null,"abstract":"Many digital resources, like the Web, are dynamic and ever-changing collections of information. However, most of the tools information retrieval and management that have been developed for interacting with Web content, such as browsers and search engines, focus on a single static snapshot of the information. In this talk, I will present analyses of how Web content changes over time, how people re-visit Web pages over time, and how re-visitation patterns are influenced by changes in user intent and content. These results have implications for many aspects of information retrieval and management including crawling, ranking and information extraction algorithms, result presentation, and evaluation. I will describe a prototype system that supports people in understanding how the information they interact with changes over time, and a new retrieval model that incorporates features about the temporal evolution of content to improve core ranking. Finally, I will conclude with an overview of some general challenges that need to be addressed to fully incorporate temporal dynamics in information retrieval and information management systems.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130947905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experiences with using SVM-based learning for multi-objective ranking","authors":"L. Nguyen, Wai Gen Yee, Roger Liew, O. Frieder","doi":"10.1145/1871437.1871763","DOIUrl":"https://doi.org/10.1145/1871437.1871763","url":null,"abstract":"We describe our experiences in applying learning-to-rank techniques to improving the quality of search results of an online hotel reservation system. The search result quality factors we use are average booking position and distribution of margin in top-ranked results. (We expect that total revenue will increase with these factors.) Our application of the SVMRank technique improves booking position by up to 25% and margin distribution by up to 14%.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130474848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}