{"title":"Improving Entity Ranking for Keyword Queries","authors":"John Foley, Brendan T. O'Connor, J. Allan","doi":"10.1145/2983323.2983909","DOIUrl":"https://doi.org/10.1145/2983323.2983909","url":null,"abstract":"Knowledge bases about entities are an important part of modern information retrieval systems. A strong ranking of entities can be used to enhance query understanding and document retrieval or can be presented as another vertical to the user. Given a keyword query, our task is to provide a ranking of the entities present in the collection of interest. We are particularly interested in approaches to this problem that generalize to different knowledge bases and different collections. In the past, this kind of problem has been explored in the enterprise domain through Expert Search. Recently, a dataset was introduced for entity ranking from news and web queries from more general TREC collections. Approaches from prior work leverage a wide variety of lexical resources: e.g., natural language processing and relations in the knowledge base. We address the question of whether we can achieve competitive performance with minimal linguistic resources. We propose a set of features that do not require index-time entity linking, and demonstrate competitive performance on the new dataset. As this paper is the first non-introductory work to leverage this new dataset, we also find and correct certain aspects of the benchmark. To support a fair evaluation, we collect 38% more judgments and contribute annotator agreement information.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115689076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering Entities with Just a Little Help from You","authors":"Jaspreet Singh, Johannes Hoffart, Avishek Anand","doi":"10.1145/2983323.2983798","DOIUrl":"https://doi.org/10.1145/2983323.2983798","url":null,"abstract":"Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116777671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized Semantic Word Vectors","authors":"J. Ebrahimi, D. Dou","doi":"10.1145/2983323.2983875","DOIUrl":"https://doi.org/10.1145/2983323.2983875","url":null,"abstract":"Distributed word representations are able to capture syntactic and semantic regularities in text. In this paper, we present a word representation scheme that incorporates authorship information. While maintaining similarity among related words in the induced distributed space, our word vectors can be effectively used for some text classification tasks too. We build on a log-bilinear document model (lbDm), which extracts document features, and word vectors based on word co-occurrence counts. First, we propose a log-bilinear author model (lbAm), which contains an additional author matrix. We show that by directly learning author feature vectors, as opposed to document vectors, we can learn better word representations for the authorship attribution task. Furthermore, authorship information has been found to be useful for sentiment classification. We enrich the author model with a sentiment tensor, and demonstrate the effectiveness of this hybrid model (lbHm) through our experiments on a movie review-classification dataset.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116913325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Time-Discounted Influence Maximization","authors":"Arijit Khan","doi":"10.1145/2983323.2983862","DOIUrl":"https://doi.org/10.1145/2983323.2983862","url":null,"abstract":"The classical influence maximization (IM) problem in social networks does not distinguish between whether a campaign gets viral in a week or in a year. From the practical standpoint, however, campaigns for a new technology or an upcoming movie must be spread as quickly as possible, otherwise they will be obsolete. To this end, we formulate and investigate the novel problem of maximizing the time-discounted influence spread in a social network, that is, the campaigner is interested in both \"when\" and \"how likely\" a user would be influenced. In particular, we assume that the campaigner has a utility function which monotonically decreases with the time required for a user to get influenced, since the activation of the seed nodes. The problem that we solve in this paper is to maximize the expected aggregated value of this utility function over all network users. This is a novel and relevant problem that, surprisingly, has not been studied before. Time-discounted influence maximization (TDIM), being a generalization of the classical IM, still remains NP-hard. However, our main contribution is to prove the sub-modularity of the objective function for any monotonically decreasing function of time, under a variety of influence cascading models, e.g., the independent cascade, linear threshold, and maximum influence arborescence models, thereby designing approximate algorithms with theoretical performance guarantees. We also illustrate that the existing optimization techniques (e.g., CELF) for influence maximization are more efficient over TDIM. Our experimental results demonstrate the effectiveness of our solutions over several baselines including the classical influence maximization algorithms.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116920332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Preotiuc-Pietro, J. Carpenter, Salvatore Giorgi, L. Ungar
{"title":"Studying the Dark Triad of Personality through Twitter Behavior","authors":"Daniel Preotiuc-Pietro, J. Carpenter, Salvatore Giorgi, L. Ungar","doi":"10.1145/2983323.2983822","DOIUrl":"https://doi.org/10.1145/2983323.2983822","url":null,"abstract":"Research into the darker traits of human nature is growing in interest especially in the context of increased social media usage. This allows users to express themselves to a wider online audience. We study the extent to which the standard model of dark personality -- the dark triad -- consisting of narcissism, psychopathy and Machiavellianism, is related to observable Twitter behavior such as platform usage, posted text and profile image choice. Our results show that we can map various behaviors to psychological theory and study new aspects related to social media usage. Finally, we build a machine learning algorithm that predicts the dark triad of personality in out-of-sample users with reliable accuracy.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117263793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Distributed Regular Path Queries on RDF Graphs Using Partial Evaluation","authors":"Xin Wang, Junhu Wang, Xiaowang Zhang","doi":"10.1145/2983323.2983877","DOIUrl":"https://doi.org/10.1145/2983323.2983877","url":null,"abstract":"We propose an efficient distributed method for answering regular path queries (RPQs) on large-scale RDF graphs using partial evaluation. In local computation, we devise a dynamic programming approach to evaluate local and partial answers of an RPQ on each computing site in parallel. In the assembly phase, an automata-based algorithm is proposed to assemble the partial answers of the RPQ into the final results. The experiments on benchmark RDF graphs show that our method outperforms the state-of-the-art message passing methods by up to an order of magnitude.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121119538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Jin, Daniel Agun, Tao Yang, Qinghao Wu, Yifan Shen, Susen Zhao
{"title":"Hybrid Indexing for Versioned Document Search with Cluster-based Retrieval","authors":"Xin Jin, Daniel Agun, Tao Yang, Qinghao Wu, Yifan Shen, Susen Zhao","doi":"10.1145/2983323.2983733","DOIUrl":"https://doi.org/10.1145/2983323.2983733","url":null,"abstract":"The previous two-phase method for searching versioned documents seeks a cost tradeoff by using non-positional information to rank document versions first. The second phase then re-ranks top document versions using positional information with fragment-based index compression. This paper proposes an alternative approach that uses cluster-based retrieval to quickly narrow the search scope guided by version representatives at Phase 1 and develops a hybrid index structure with adaptive runtime data traversal to speed up Phase 2 search. The hybrid scheme exploits the advantages of forward index and inverted index based on the term characteristics to minimize the time in extracting positional and other feature information during runtime search. This paper compares several indexing and data traversal options with different time and space tradeoffs and describes evaluation results to demonstrate their effectiveness. The experiment results show that the proposed scheme can be up-to about 4x as fast as the previous work on solid state drives while retaining good relevance.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127514849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Panagiotis Liakos, Katia Papakonstantinopoulou, A. Delis
{"title":"Memory-Optimized Distributed Graph Processing through Novel Compression Techniques","authors":"Panagiotis Liakos, Katia Papakonstantinopoulou, A. Delis","doi":"10.1145/2983323.2983687","DOIUrl":"https://doi.org/10.1145/2983323.2983687","url":null,"abstract":"A multitude of contemporary applications now involve graph data whose size continuously grows and this trend shows no signs of subsiding. This has caused the emergence of many distributed graph processing systems including Pregel and Apache Giraph. However, the unprecedented scale now reached by real-world graphs hardens the task of graph processing even in distributed environments and the current memory usage patterns rapidly become a primary concern for such contemporary graph processing systems. We seek to address this challenge by exploiting empirically-observed properties demonstrated by graphs that are generated by human activity. In this paper, we propose three space-efficient adjacency list representations that can be applied to any distributed graph processing system. Our suggested compact representations reduce respective memory requirements for accommodating the graph elements up to 5 times if compared with state-of-the-art methods. At the same time, our memory-optimized methods retain the efficiency of uncompressed structures and enable the execution of algorithms for large scale graphs in settings where contemporary alternative structures fail due to memory errors.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123271566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forecasting Geo-sensor Data with Participatory Sensing Based on Dropout Neural Network","authors":"Jyun-Yu Jiang, Cheng-te Li","doi":"10.1145/2983323.2983902","DOIUrl":"https://doi.org/10.1145/2983323.2983902","url":null,"abstract":"Nowadays, geosensor data, such as air quality and traffic flow, have become more and more essential in people's daily life. However, installing geosensors or hiring volunteers at every location and every time is so expensive. Some organizations may have only few facilities or limited budget to sense these data. Moreover, people usually tend to know the forecast instead of ongoing observations, but the number of sensors (or volunteers) will be a hurdle to make precise prediction. In this paper, we propose a novel concept to forecast geosensor data with participatory sensing. Given a limited number of sensors or volunteers, participatory sensing assumes each of them can observe and collect data at different locations and at different time. By aggregating these sparse data observations in the past time, we propose a neural network based approach to forecast the future geosensor data in any location of an urban area. The extensive experiments have been conducted with large-scale datasets of the air quality in three cities and the traffic of bike sharing systems in two cities. Experimental results show that our predictive model can precisely forecast the air quality and the bike rentle traffic as geosensor data.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126763284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalizing Translation Models in the Probabilistic Relevance Framework","authors":"Navid Rekabsaz, M. Lupu, A. Hanbury, G. Zuccon","doi":"10.1145/2983323.2983833","DOIUrl":"https://doi.org/10.1145/2983323.2983833","url":null,"abstract":"A recurring question in information retrieval is whether term associations can be properly integrated in traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language Modelling) by introducing a generalisation of the idea of the translation model. This generalisation is a de facto transformation of the translation models from Language Modelling to the probabilistic models. In doing so, we observe a potential limitation of these generalised translation models: they only affect the term frequency based components of all the models, ignoring changes in document and collection statistics. We correct this limitation by extending the translation models with the 15 statistics of term associations and provide extensive experimental results to demonstrate the benefit of the newly proposed methods. Additionally, we compare the translation models with query expansion methods based on the same term association resources, as well as based on Pseudo-Relevance Feedback (PRF). We observe that translation models always outperform the first, but provide complementary information with the second, such that by using PRF and our translation models together we observe results better than the current state of the art.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122091043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}