Scott Freitas, Hanghang Tong, Nan Cao, Yinglong Xia
{"title":"Rapid Analysis of Network Connectivity","authors":"Scott Freitas, Hanghang Tong, Nan Cao, Yinglong Xia","doi":"10.1145/3132847.3133170","DOIUrl":"https://doi.org/10.1145/3132847.3133170","url":null,"abstract":"This research focuses on accelerating the computational time of two base network algorithms (k-simple shortest paths and minimum spanning tree for a subset of nodes)---cornerstones behind a variety of network connectivity mining tasks---with the goal of rapidly finding networkpathways andtrees using a set of user-specific query nodes. To facilitate this process we utilize: (1) multi-threaded algorithm variations, (2) network re-use for subsequent queries and (3) a novel algorithm, Key Neighboring Vertices (KNV), to reduce the network search space. The proposed KNV algorithm serves a dual purpose: (a) to reduce the computation time for algorithmic analysis and (b) to identify key vertices in the network (textit ). Empirical results indicate this combination of techniques significantly improves the baseline performance of both algorithms. We have also developed a web platform utilizing the proposed network algorithms to enable researchers and practitioners to both visualize and interact with their datasets (PathFinder: http://www.path-finder.io.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"261 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78403631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Theodore Vasiloudis, F. Beligianni, G. D. F. Morales
{"title":"BoostVHT: Boosting Distributed Streaming Decision Trees","authors":"Theodore Vasiloudis, F. Beligianni, G. D. F. Morales","doi":"10.1145/3132847.3132974","DOIUrl":"https://doi.org/10.1145/3132847.3132974","url":null,"abstract":"Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79077418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Email Volume For Sitewide Engagement","authors":"Rupesh Gupta, Guanfeng Liang, Rómer Rosales","doi":"10.1145/3132847.3132849","DOIUrl":"https://doi.org/10.1145/3132847.3132849","url":null,"abstract":"In this paper we focus on the problem of optimizing email volume for maximizing sitewide engagement of an online social networking service. Email volume optimization approaches published in the past have proposed optimization of email volume for maximization of engagement metrics which are impacted exclusively by email; for example, the number of sessions that begin with clicks on links within emails. The impact of email on such downstream engagement metrics can be estimated easily because of the ease of attribution of such an engagement event to an email. However, this framework is limited in its view of the ecosystem of the networking service which comprises of several tools and utilities that contribute towards delivering value to members; with email being just one such utility. Thus, in this paper we depart from previous approaches by exploring and optimizing the contribution of email to this ecosystem. In particular, we present and contrast the differential impact of email on sitewide engagement metrics for various types of users. We propose a new email volume optimization approach which maximizes sitewide engagement metrics, such as the total number of active users. This is in sharp contrast to the previous approaches whose objective has been maximization of downstream engagement metrics. We present details of our prediction function for predicting the impact of emails on a user's activeness on the mobile or web application. We describe how certain approximations to this prediction function can be made for solving the volume optimization problem, and present results from online A/B tests.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"151 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79507874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Supratim Das, Arunav Mishra, K. Berberich, Vinay Setty
{"title":"Estimating Event Focus Time Using Neural Word Embeddings","authors":"Supratim Das, Arunav Mishra, K. Berberich, Vinay Setty","doi":"10.1145/3132847.3133131","DOIUrl":"https://doi.org/10.1145/3132847.3133131","url":null,"abstract":"Time associated with news events has been leveraged as a complementary dimension to text in several applications such as temporal information retrieval, news event linking, etc. Short textual event descriptions (e.g., single sentences) are prevalent in web documents (also considered as inputs in the above applications) and often lack explicit temporal expressions for grounding them to a precise time period. For example, the event description, \"France swears in Emmanuel Macron as the 25th President\", lacks temporal cues to indicate that the event occurred in the year \"2017\". Thus, we address the problem of estimating event focus time defined as a time interval with maximum association thereby indicating its occurrence period. We propose several estimators that leverage distributional event and time representations learned from large external document collections by adapting the word2vec paradigm. Extensive experiments using two real-world datasets and 100 Wikipedia events show that our method outperforms several state-of-the-art baselines.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79531556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collecting Non-Geotagged Local Tweets via Bandit Algorithms","authors":"Saki Ueda, Yuto Yamaguchi, H. Kitagawa","doi":"10.1145/3132847.3133046","DOIUrl":"https://doi.org/10.1145/3132847.3133046","url":null,"abstract":"How can we collect non-geotagged tweets posted by users in a specific location as many as possible in a limited time span? How can we find such users if we do not have much information about the specified location? Although there are varieties of methods to estimate the locations of users, these methods are not directly applicable to this problem because they require collecting a large amount of random tweets and then filter them to obtain a small amount of tweets from such users. In this paper, we propose a framework that incrementally finds such users and continuously collects tweets from them. Our framework is based on the bandit algorithm that adjusts the trade-off between exploration and exploitation, in other words, it simultaneously finds new users in the specified location and collects tweets from already-found users. The experimental results show that the bandit algorithm works well on this problem and outperforms the carefully-designed baselines.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78654131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Stage Framework for Computing Entity Relatedness in Wikipedia","authors":"Marco Ponza, P. Ferragina, Soumen Chakrabarti","doi":"10.1145/3132847.3132890","DOIUrl":"https://doi.org/10.1145/3132847.3132890","url":null,"abstract":"Introducing a new dataset with human judgments of entity relatedness, we present a thorough study of all entity relatedness measures in recent literature based on Wikipedia as the knowledge graph. No clear dominance is seen between measures based on textual similarity and graph proximity. Some of the better measures involve expensive global graph computations. We then propose a new, space-efficient, computationally lightweight, two-stage framework for relatedness computation. In the first stage, a small weighted subgraph is dynamically grown around the two query entities; in the second stage, relatedness is derived based on computations on this subgraph. Our system shows better agreement with human judgment than existing proposals both on the new dataset and on an established one. We also plug our relatedness algorithm into a state-of-the-art entity linker and observe an increase in its accuracy and robustness.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73851322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Matrix-Vector Recurrent Unit Model for Capturing Compositional Semantics in Phrase Embeddings","authors":"Rui Wang, Wei Liu, C. McDonald","doi":"10.1145/3132847.3132984","DOIUrl":"https://doi.org/10.1145/3132847.3132984","url":null,"abstract":"The meaning of a multi-word phrase not only depends on the meaning of its constituent words, but also the rules of composing them to give the so-called compositional semantic. However, many deep learning models for learning compositional semantics target specific NLP tasks such as sentiment classification. Consequently, the word embeddings encode the lexical semantics, the weights of the networks are optimised for the classification task. Such models have no mechanisms to explicitly encode the compositional rules, and hence they are insufficient in capturing the semantics of phrases. We present a novel recurrent computational mechanism that specifically learns the compositionality by encoding the compositional rule of each word into a matrix. The network uses a recurrent architecture to capture the order of words for phrases with various lengths without requiring extra preprocessing such as part-of-speech tagging. The model is thoroughly evaluated on both supervised and unsupervised NLP tasks including phrase similarity, noun-modifier questions, sentiment distribution prediction, and domain specific term identification tasks. We demonstrate that our model consistently outperforms the LSTM and CNN deep learning models, simple algebraic compositions, and other popular baselines on different datasets.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74656988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Kharlamov, Luca Giacomelli, Evgeny Sherkhonov, B. C. Grau, Egor V. Kostylev, Ian Horrocks
{"title":"SemFacet: Making Hard Faceted Search Easier","authors":"E. Kharlamov, Luca Giacomelli, Evgeny Sherkhonov, B. C. Grau, Egor V. Kostylev, Ian Horrocks","doi":"10.1145/3132847.3133192","DOIUrl":"https://doi.org/10.1145/3132847.3133192","url":null,"abstract":"Faceted search is a prominent search paradigm that became the standard in many Web applications and has also been recently proposed as a suitable paradigm for exploring and querying RDF graphs. One of the main challenges that hampers usability of faceted search systems especially in the RDF context is information overload, that is, when the size of faceted interfaces becomes comparable to the size of the data over which the search is performed. In this demo we present (an extension of) our faceted search system SemFacet and focus on features that address the information overload: ranking, aggregation, and reachability. The demo attendees will be able to try our system on an RDF graph that models online shopping over a catalogs with up to millions of products.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75110066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval","authors":"Avikalp Srivastava, Madhav Datt","doi":"10.1145/3132847.3133162","DOIUrl":"https://doi.org/10.1145/3132847.3133162","url":null,"abstract":"Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant to applications like Quora, Stack Overflow, Siri etc. We propose a novel unsupervised model for semantic similarity based content retrieval, where we construct semantic flow graphs for each query, and introduce the concept of \"soft seeding\" in graph based semi-supervised learning (SSL) to convert this into an unsupervised model. We demonstrate the effectiveness of our model on an equivalent question retrieval problem on the Stack Exchange QA dataset, where our unsupervised approach significantly outperforms the state-of-the-art unsupervised models, and produces comparable results to the best supervised models. Our research provides a method to tackle semantic similarity based retrieval without any training data, and allows seamless extension to different domain QA communities, as well as to other semantic equivalence tasks.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"224 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75811381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xian-Ping Deng, C. Cui, Huidi Fang, Xiushan Nie, Yilong Yin
{"title":"Personalized Image Aesthetics Assessment","authors":"Xian-Ping Deng, C. Cui, Huidi Fang, Xiushan Nie, Yilong Yin","doi":"10.1145/3132847.3133052","DOIUrl":"https://doi.org/10.1145/3132847.3133052","url":null,"abstract":"Automatically assessing image quality from an aesthetic perspective is of great interest to the high-level vision research community. Existing methods are typically non-personalized and quantify image aesthetics with a universal label. However, given the fact that aesthetics is a subjective perception, how to understand user aesthetic perceptions poses a formidable challenge to image aesthetics assessment. In this paper, we propose to model user aesthetic perceptions using a set of exemplar images from social media platforms, and realize personalized aesthetics assessment by transferring this knowledge to adapt the results of the trained generic model. In this way, image aesthetics is measured from both aspects of visual quality and user tastes. Extensive experiments on two benchmark datasets well verified the potential of our approach for personalized image aesthetics assessment.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75839018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}