Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献
{"title":"Object ranking","authors":"R. V. Zwol, Srinivas Vadrevu","doi":"10.1145/2063576.2064038","DOIUrl":"https://doi.org/10.1145/2063576.2064038","url":null,"abstract":"Object ranking is an emerging discipline within information retrieval that is concerned with the ranking of objects, e.g. named entities and their attributes, in context of given a user query, or application. In this tutorial we will address the different aspects involved when building an object ranking system. We will present the state-of-the-art research in object ranking, as well as going into detail about our hands-on experiences when designing and developing the system for object ranking as it is in production at Yahoo! today. This allows for a unique mixture of research and development that will give the participants in-depth insights into the problem of object ranking.\u0000 The focus of current Web search engines is to retrieve relevant documents on the Web, and more precisely documents that match with the query intent of the user. Some users are looking for specific information, while other just want to access rich media content (images, videos, etc.) or explore a topic. In the latter scenario, users do not have a fixed or pre-determined information need, but are using the search engine to discover information related to a particular object of interest. In this scenario one can say that the user is in a exploratory mode.\u0000 To support users in their exploratory search the search engines are offering semantic search suggestions. In this tutorial, we will present a generic framework for ranking related objects. This framework ranks related entities according to two dimensions: a lateral dimension and a faceted dimension. In the lateral dimension, related entities are of the same nature as the entity queried (e.g. Barcelona and Madrid, or Angelina Jolie and Jessica Alba). In the faceted dimension, related entities are usually not of the same type as the queried entity, and refer to a specific aspect of the queried entity (e.g. Jennifer Aniston and the tvshow Friends).\u0000 In this tutorial we will describe the process of building a Web-scale object ranking system. In particular we will address the construction of a knowledge base that forms the basis for the object ranking, and the generation of ranking features using external sources such as search engine query logs, photo annotations in Flickr, and tweets on Twitter. Next, we will discuss machine learned ranking models using an ensemble of pair-wise preference models, and address various aspects of object ranking, including multi-media extensions, vertical solutions, attribute-aware ranking, and the importance of freshness. Last but not least, we will address the evaluation methodologies involved to tune the performance of Web-scale object ranking strategies.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"4 1","pages":"2613-2614"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80520048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen Li, P. Serdyukov, A. D. Vries, Carsten Eickhoff, M. Larson
{"title":"The where in the tweet","authors":"Wen Li, P. Serdyukov, A. D. Vries, Carsten Eickhoff, M. Larson","doi":"10.1145/2063576.2063995","DOIUrl":"https://doi.org/10.1145/2063576.2063995","url":null,"abstract":"Twitter is a widely-used social networking service which enables its users to post text-based messages, so-called tweets. POI tags on tweets can show more human-readable high-level information about a place rather than just a pair of coordinates. In this paper, we attempt to predict the POI tag of a tweet based on its textual content and time of posting. Potential applications include accurate positioning when GPS devices fail and disambiguating places located near each other. We consider this task as a ranking problem, i.e., we try to rank a set of candidate POIs according to a tweet by using language and time models. To tackle the sparsity of tweets tagged with POIs, we use web pages retrieved by search engines as an additional source of evidence. From our experiments, we find that users indeed leak some information about their accurate locations in their tweets.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"64 1","pages":"2473-2476"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80342061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DOLAP 2011: overview of the 14th international workshop on data warehousing and olap","authors":"A. Cuzzocrea, K. Davis, I. Song","doi":"10.1145/2063576.2064055","DOIUrl":"https://doi.org/10.1145/2063576.2064055","url":null,"abstract":"The ACM 14th International Workshop on Data Warehousing and OLAP (DOLAP 2011), held in Glasgow, Scotland, UK on October 28, 2011, in conjunction with the ACM 20th International Conference on Information and Knowledge Management (CIKM 2011), presents research on data warehousing and On-Line Analytical Processing (OLAP). The DOLAP 2011 program has three interesting sessions on data warehouse modeling and maintenance, ETL and performance, and OLAP visualization and extensions, and a panel discussing analytics in data warehouses.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"7 1","pages":"2645-2646"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80467880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Islam, Faisal Ahmed, Y. Borodin, I. Ramakrishnan
{"title":"Tightly coupling visual and linguistic features for enriching audio-based web browsing experience","authors":"M. Islam, Faisal Ahmed, Y. Borodin, I. Ramakrishnan","doi":"10.1145/2063576.2063896","DOIUrl":"https://doi.org/10.1145/2063576.2063896","url":null,"abstract":"People who are blind use screen readers for browsing web pages. Since screen readers read out content serially, a naive readout tends to mix irrelevant and relevant content thereby disrupting the coherency of the material being read out and confusing the listener. To address this problem we can partition web pages into coherent segments and narrate each such piece separately. Extant methods to do segmentation use visual and structural cues without taking the semantics into account and consequently create segments containing irrelevant material. In this paper, we describe a new technique for creating coherent segments by tightly coupling visual, structural, and linguistic features present in the content. A notable aspect of the technique is that it produces segments with little irrelevant content. Preliminary experiments indicate that the technique is effective in creating highly coherent segments and the experiences of an early adopter who is blind suggest that it enriches the overall browsing experience.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"347 1","pages":"2085-2088"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82977254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Chen, Xiaohua Hu, Yuan An, Zunyan Xiong, Tingting He, Eun Kyo Park
{"title":"Perspective hierarchical dirichlet process for user-tagged image modeling","authors":"Xin Chen, Xiaohua Hu, Yuan An, Zunyan Xiong, Tingting He, Eun Kyo Park","doi":"10.1145/2063576.2063770","DOIUrl":"https://doi.org/10.1145/2063576.2063770","url":null,"abstract":"In this paper, we proposed a perspective Hierarchical Dirichlet Process (pHDP) model to deal with user-tagged image modeling. The contribution is two-fold. Firstly, we associate image features with image tags. Secondly, we incorporate the user's perspectives into the image tag generation process and introduce new latent variables to determine if an image tag is generated from user's perspectives or from the image content. Therefore, the model is able to extract both embedded semantic components and user's perspectives from user-tagged images. Based on the proposed pHDP model, we achieve automatic image tagging with users' perspective. Experimental results show that the pHDP model achieves better image tagging performance compared to state-of-the-art topic models.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"7 1","pages":"1341-1346"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83356570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient methods for finding influential locations with adaptive grids","authors":"D. Yan, R. C. Wong, Wilfred Ng","doi":"10.1145/2063576.2063788","DOIUrl":"https://doi.org/10.1145/2063576.2063788","url":null,"abstract":"Given a set S of servers and a set C of clients, an optimal-location query returns a location where a new server can attract the greatest number of clients. Optimal-location queries are important in a lot of real-life applications, such as mobile service planning or resource distribution in an area. Previous studies assume that a client always visits its nearest server, which is too strict to be true in reality. In this paper, we relax this assumption and propose a new model to tackle this problem. We further generalize the problem to finding top-k optimal locations. The main challenge is that, even the fastest approach in existing studies needs to take hours to answer an optimal-location query on a typical real world dataset, which significantly limits the applications of the query. Using our relaxed model, we design an efficient grid-based approximation algorithm called FILM (Fast Influential Location Miner) to the queries, which is orders of magnitude faster than the best-known previous work and the number of clients attracted by a new server in the result location often exceeds 98% of the optimal. The algorithm is extended to finding k influential locations. Extensive experiments are conducted to show the efficiency and effectiveness of FILM on both real and synthetic datasets.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"30 1","pages":"1475-1484"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82234173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-indexing semi-structured data in tiny space","authors":"G. Ottaviano, R. Grossi","doi":"10.1145/2063576.2063790","DOIUrl":"https://doi.org/10.1145/2063576.2063790","url":null,"abstract":"Semi-structured textual formats are gaining increasing popularity for the storage of document collections and rich logs. Their flexibility comes at the cost of having to load and parse a document entirely even if just a small part of it needs to be accessed. For instance, in data analytics massive collections are usually scanned sequentially, selecting a small number of attributes from each document. We propose a technique to attach to a raw, unparsed document (even in compressed form) a \"semi-index\": a succinct data structure that supports operations on the document tree at speed comparable with an in-memory deserialized object, thus bridging textual formats with binary formats. After describing the general technique, we focus on the JSON format: our experiments show that avoiding the full loading and parsing step can give speedups of up to 12 times for on-disk documents using a small space overhead.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"5 1","pages":"1485-1494"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81600276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local computation of PageRank: the ranking side","authors":"M. Bressan, Luca Pretto","doi":"10.1145/2063576.2063670","DOIUrl":"https://doi.org/10.1145/2063576.2063670","url":null,"abstract":"Imagine you are a social network user who wants to search, in a list of potential candidates, for the best candidate for a job on the basis of their PageRank-induced importance ranking. Is it possible to compute this ranking for a low cost, by visiting only small subnetworks around the nodes that represent each candidate? The fundamental problem underpinning this question, i.e. computing locally the PageRank ranking of k nodes in an $n$-node graph, was first raised by Chen et al. (CIKM 2004) and then restated by Bar-Yossef and Mashiach (CIKM 2008). In this paper we formalize and provide the first analysis of the problem, proving that any local algorithm that computes a correct ranking must take into consideration Ω(√(kn)) nodes -- even when ranking the top $k$ nodes of the graph, even if their PageRank scores are \"well separated\", and even if the algorithm is randomized (and we prove a stronger Ω(n) bound for deterministic algorithms). Experiments carried out on large, publicly available crawls of the web and of a social network show that also in practice the fraction of the graph to be visited to compute the ranking may be considerable, both for algorithms that are always correct and for algorithms that employ (efficient) local score approximations.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"28 1","pages":"631-640"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82341758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantin Tretyakov, Abel Armas-Cervantes, L. García-Bañuelos, J. Vilo, M. Dumas
{"title":"Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs","authors":"Konstantin Tretyakov, Abel Armas-Cervantes, L. García-Bañuelos, J. Vilo, M. Dumas","doi":"10.1145/2063576.2063834","DOIUrl":"https://doi.org/10.1145/2063576.2063834","url":null,"abstract":"Computing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of users and billions of connections. A number of approximate methods have been proposed, including several landmark-based methods that have been shown to scale up to very large graphs with acceptable accuracy. This paper presents two improvements to existing landmark-based shortest path estimation methods. The first improvement relates to the use of shortest-path trees (SPTs). Together with appropriate short-cutting heuristics, the use of SPTs allows to achieve higher accuracy with acceptable time and memory overhead. Furthermore, SPTs can be maintained incrementally under edge insertions and deletions, which allows for a fully-dynamic algorithm. The second improvement is a new landmark selection strategy that seeks to maximize the coverage of all shortest paths by the selected landmarks. The improved method is evaluated on the DBLP, Orkut, Twitter and Skype social networks.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"15 1","pages":"1785-1794"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82523921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retrieving and ranking unannotated images through collaboratively mining online search results","authors":"Songhua Xu, Hao Jiang, F. Lau","doi":"10.1145/2063576.2063650","DOIUrl":"https://doi.org/10.1145/2063576.2063650","url":null,"abstract":"We present a new image search and ranking algorithm for retrieving unannotated images by collaboratively mining online search results which consist of online image and text search results. The online image search results are leveraged as reference examples to perform content-based image search over unannotated images. The online text search results are utilized to estimate the reference images' relevance to the search query. The key feature of our method is its capability to deal with unreliable online image search results through jointly mining visual and textual aspects of online search results. Through such collaborative mining, our algorithm infers the relevance of an online search result image to a text query. Once we obtain the estimate of query relevance score for each online image search result, we can selectively use query specific online search result images as reference examples for retrieving and ranking unannotated images. We tested our algorithm both on the standard public image datasets and several modestly sized personal photo collections. We also compared our method with two well-known peer methods. The results indicate that our algorithm is superior to existing content-based image search algorithms for retrieving and ranking unannotated images.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"25 1","pages":"485-494"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80909168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}