Abhijith Kashyap, Vagelis Hristidis, M. Petropoulos
{"title":"FACeTOR: cost-driven exploration of faceted query results","authors":"Abhijith Kashyap, Vagelis Hristidis, M. Petropoulos","doi":"10.1145/1871437.1871530","DOIUrl":"https://doi.org/10.1145/1871437.1871530","url":null,"abstract":"Faceted navigation is being increasingly employed as an effective technique for exploring large query results on structured databases. This technique of mitigating information-overload leverages metadata of the query results to provide users with facet conditions that can be used to progressively refine the user's query and filter the query results. However, the number of facet conditions can be quite large, thereby increasing the burden on the user. We present the FACeTOR system that proposes a cost-based approach to faceted navigation. At each step of the navigation, the user is presented with a subset of all possible facet conditions that are selected such that the overall expected navigation cost is minimized and every result is guaranteed to be reachable by a facet condition. We prove that the problem of selecting the optimal facet conditions at each navigation step is NP-Hard, and subsequently present two intuitive heuristics employed by FACeTOR. Our user study at Amazon Mechanical Turk shows that FACeTOR reduces the user navigation time compared to the cutting edge commercial and academic faceted search algorithms. The user study also confirms the validity of our cost model. We also present the results of an extensive experimental evaluation on the performance of the proposed approach using two real datasets. FACeTOR is available at http://db.cse.buffalo.edu/facetor/.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125163919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatically weighting tags in XML collection","authors":"Dexi Liu, Changxuan Wan, Lei Chen, X. Liu","doi":"10.1145/1871437.1871603","DOIUrl":"https://doi.org/10.1145/1871437.1871603","url":null,"abstract":"In XML retrieval, nodes with different tags play different roles in XML documents and then tags should be reflected in the relevance ranking. An automatic method is proposed in this paper to infer the weights of tags. We first investigate 15 features about tags, and then select five of them based on the correlations between these features and manual tag weights. Using these features, a tag weight assignment model, ATG, is designed. We evaluate the performance of ATG on two real data sets, IEEECS and Wikipedia from two different perspectives. One is to evaluate the quality of the model by measuring the correlation between weights generated by our model and those given by experts. The other is to test the effectiveness of the model in improving retrieval performance. Experimental results show that the tag weights generated by ATG are highly correlated with the manually assigned weights and the ATG model improves retrieval effectiveness significantly.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"29 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120931086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Bjellerup, Karl J. Cama, Mukundan Desikan, Yi Guo, Ajinkya Kale, J. Lai, Nizar Lethif, Jie Lu, Mercan Topkara, Stephan H. Wissel
{"title":"FALCON: seamless access to meeting data from the inbox and calendar","authors":"Peter Bjellerup, Karl J. Cama, Mukundan Desikan, Yi Guo, Ajinkya Kale, J. Lai, Nizar Lethif, Jie Lu, Mercan Topkara, Stephan H. Wissel","doi":"10.1145/1871437.1871780","DOIUrl":"https://doi.org/10.1145/1871437.1871780","url":null,"abstract":"We present a system that supports seamless access to information contained in recorded meetings from the cornerstone points of a knowledge worker's daily life: mailbox and calendar. The solution supports granular search of meeting content from an enterprise email system and automatically displays recordings of meetings related to the message the user is currently viewing. Additionally thumbnail summaries of the meetings are added to the user's calendar entries after the meetings have taken place. Lastly our system supports easy sharing of videos associated with recorded meetings through the use of hot-linked thumbnail summaries which can be sent via email.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123478801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GM PavanKumar, Krishna P. Leela, Mehul Parsana, S. Garg
{"title":"Relevance-index size tradeoff in contextual advertising","authors":"GM PavanKumar, Krishna P. Leela, Mehul Parsana, S. Garg","doi":"10.1145/1871437.1871713","DOIUrl":"https://doi.org/10.1145/1871437.1871713","url":null,"abstract":"In Contextual advertising, textual ads relevant to the content in a webpage are embedded in the page. Content keywords are extracted offline by crawling webpages and then stored in an index for fast serving. Given a page, ad selection involves index lookup, computing similarity between the keywords of the page and those of candidate ads and returning the top-k scoring ads. In this approach, there is a tradeoff between relevance and index size where better relevance can be achieved if there are no limits on the index size. However, the assumption of unlimited index size is not practical due to the large number of pages on the Web and stringent requirements on the serving latency. Secondly, page visits on the web follows power-law distribution where a significant proportion of the pages are visited infrequently, also called the tail pages. Indexing tail pages is not efficient given that these pages are accessed very infrequently. We propose a novel mechanism to mitigate these problems in the same framework. The basic idea is to index the same keyword vector for a set of similar pages. The scheme involves learning a website specific hierarchy from (page, URL) pairs of the website. Next, keywords are populated on the nodes via bottom-up traversal over the hierarchy. We evaluate our approach on a human labeled dataset where our approach has higher nDCG compared to a recent approach even though the index size of our approach is 7 times less than index size of the recent approach.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124153262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Webber, Douglas W. Oard, Falk Scholer, Bruce Hedin
{"title":"Assessor error in stratified evaluation","authors":"William Webber, Douglas W. Oard, Falk Scholer, Bruce Hedin","doi":"10.1145/1871437.1871508","DOIUrl":"https://doi.org/10.1145/1871437.1871508","url":null,"abstract":"Several important information retrieval tasks, including those in medicine, law, and patent review, have an authoritative standard of relevance, and are concerned about retrieval completeness. During the evaluation of retrieval effectiveness in these domains, assessors make errors in applying the standard of relevance, and the impact of these errors, particularly on estimates of recall, is of crucial concern. Using data from the interactive task of the TREC Legal Track, this paper investigates how reliably the yield of relevant documents can be estimated from sampled assessments in the presence of assessor error, particularly where sampling is stratified based upon the results of participating retrieval systems. We show that assessor error is in general a greater source of inaccuracy than sampling error. A process of appeal and adjudication, such as used in the interactive task, is found to be effective at locating many assessment errors; but the process is expensive if complete, and biased if incomplete. An unbiased double-sampling method for resolving assessment error is proposed, and shown on representative data to be more efficient and accurate than appeal-based adjudication.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1992 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128611780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brown dwarf: a P2P data-warehousing system","authors":"Katerina Doka, Dimitrios Tsoumakos, N. Koziris","doi":"10.1145/1871437.1871777","DOIUrl":"https://doi.org/10.1145/1871437.1871777","url":null,"abstract":"In this demonstration we present the Brown Dwarf, a distributed system designed to efficiently store, query and update multidimensional data. Deployed on any number of commodity nodes, our system manages to distribute large volumes of data over network peers on-the-fly and process queries and updates on-line through cooperating nodes that hold parts of a materialized cube. Moreover, it adapts its resources according to demand and hardware failures and is cost-effective both over the required hardware and software components. All the aforementioned functionality will be tested using various datasets and query loads.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128495305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent interest-topic model: finding the causal relationships behind dyadic data","authors":"N. Kawamae","doi":"10.1145/1871437.1871521","DOIUrl":"https://doi.org/10.1145/1871437.1871521","url":null,"abstract":"This paper presents a hierarchical generative model that captures the latent relation of cause and effect underlying user behavioral-originated data such as papers, twitter and purchase history. Our proposel, the Latent Interest Topic model (LIT), introduces a latent variable into each document and each author layor in a coherent generative model. We call the former variable the document class, and the latter variable the author class, where these classes are indicator variables that allow the inclusion of different types of probability, and can be shared over documents with similar content and authors with similar interests, respectively. Significantly, unlike other works, LIT differentiates, respectively, document topics and user interests by using these classes. Consequently, LIT is superior to previous models in explaining the causal relationships behind the data by merging similar distributions; it also makes the computation process easier. Experiments on a research paper corpus show that the proposed model can well capture document and author classes, and reduce the dimensionality of documents to a low-dimensional author-document space, making it useful as a generative model.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129267287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiankun Zhao, Yuan Tian, Qi He, Nuria Oliver, R. Jin, Wang-Chien Lee
{"title":"Communication motifs: a tool to characterize social communications","authors":"Qiankun Zhao, Yuan Tian, Qi He, Nuria Oliver, R. Jin, Wang-Chien Lee","doi":"10.1145/1871437.1871694","DOIUrl":"https://doi.org/10.1145/1871437.1871694","url":null,"abstract":"Social networks mediate not only the relations between entities, but also the patterns of information propagation among them and their communication behavior. In this paper, we extensively study the temporal annotations (e.g., time stamps and duration) of historical communications in social networks and propose two novel tools -- communication motifs and maximum-flow communication motifs -- for characterizations of the patterns of information propagation in social networks. Using these motifs, we verify the following hypothesis in social communication network: 1) the functional behavioral patterns of information propagation within both social networks are stable over time; 2) the patterns of information propagation in synchronous and asynchronous social networks are different and sensitive to the cost of communication; and 3) the speed and the amount of information that is propagated through a network are correlated and dependent on individual profiles.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"71 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129655903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Broder, E. Gabrilovich, V. Josifovski, G. Mavromatis, Donald Metzler, Jane Wang
{"title":"Exploiting site-level information to improve web search","authors":"A. Broder, E. Gabrilovich, V. Josifovski, G. Mavromatis, Donald Metzler, Jane Wang","doi":"10.1145/1871437.1871630","DOIUrl":"https://doi.org/10.1145/1871437.1871630","url":null,"abstract":"Ranking Web search results has long evolved beyond simple bag-of-words retrieval models. Modern search engines routinely employ machine learning ranking that relies on exogenous relevance signals. Yet the majority of current methods still evaluate each Web page out of context. In this work, we introduce a novel source of relevance information for Web search by evaluating each page in the context of its host Web site. For this purpose, we devise two strategies for compactly representing entire Web sites. We formalize our approach by building two indices, a traditional page index and a new site index, where each \"document\" represents the an entire Web site. At runtime, a query is first executed against both indices, and then the final page score for a given query is produced by combining the scores of the page and its site. Experimental results carried out on a large-scale Web search test collection from a major commercial search engine confirm the proposed approach leads to consistent and significant improvements in retrieval effectiveness.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130712474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Pu, Oktie Hassanzadeh, Richard Drake, Renée J. Miller
{"title":"Online annotation of text streams with structured entities","authors":"K. Pu, Oktie Hassanzadeh, Richard Drake, Renée J. Miller","doi":"10.1145/1871437.1871446","DOIUrl":"https://doi.org/10.1145/1871437.1871446","url":null,"abstract":"We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-k relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn the user preference and personalize the annotation for individual users.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127531608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}