{"title":"TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding","authors":"Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, Rui Fang","doi":"10.1145/2983323.2983325","DOIUrl":"https://doi.org/10.1145/2983323.2983325","url":null,"abstract":"Classifying tweets into topic categories is necessary and important for many applications, since tweets are about a variety of topics and users are only interested in certain topical areas. Many tweet classification approaches fail to achieve high accuracy due to data sparseness issue. Tweet, as a special type of short text, in additional to its text, also has other metadata that can be used to enrich its context, such as user name, mention, hashtag and embedded link. In this demonstration, we present TweetSift, an efficient and effective real time tweet topic classifier. TweetSift exploits external tweet-specific entity knowledge to provide more topical context for a tweet, and integrates them with topic enhanced word embeddings for topic classification. The demonstration will show how TweetSift works and how it is incorporated with our social media event detection system.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"37 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116660240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy-Preserving Reachability Query Services for Massive Networks","authors":"Jiaxin Jiang, Peipei Yi, Byron Choi, Zhiwei Zhang, Xiaohui Yu","doi":"10.1145/2983323.2983799","DOIUrl":"https://doi.org/10.1145/2983323.2983799","url":null,"abstract":"This paper studies privacy-preserving reachability query services under the paradigm of data outsourcing. Specifically, graph data have been outsourced to a third-party service provider (SP), query clients submit their queries to the (SP), and the (SP) returns the query answers to the clients. However, the (SP) may not always be trustworthy. Hence, this paper investigates protecting the structural information of the graph data and the query answers from the (SP). Existing techniques are either insecure or not scalable. This paper proposes a privacy-preserving labeling, called ppTopo. To our knowledge, ppTopo is the first work that can produce reachability index on massive networks and is secure against known plaintext attacks (KPA). Specifically, we propose a scalable index construction algorithm by employing the idea of topological folding, recently proposed by Cheng et al. We propose a novel asymmetric scalar product encryption in modulo 3 (ASPE3). It allows us to encrypt the index labels and transforms the queries into scalar products of encrypted labels. We perform an experimental study of the proposed technique on the SNAP networks. Compared with the existing methods, our results show that our technique is capable of producing the encrypted indexes at least 5 times faster for massive networks and the client's decryption time is 2-3 times smaller for most graphs.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121806205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentiment Domain Adaptation with Multi-Level Contextual Sentiment Knowledge","authors":"Fangzhao Wu, Sixing Wu, Yongfeng Huang, Songfang Huang, Yong Qin","doi":"10.1145/2983323.2983851","DOIUrl":"https://doi.org/10.1145/2983323.2983851","url":null,"abstract":"Sentiment domain adaptation is widely studied to tackle the domain-dependence problem in sentiment analysis field. Existing domain adaptation methods usually train a sentiment classifier in a source domain and adapt it to the target domain using transfer learning techniques. However, when the sentiment feature distributions of the source and target domains are significantly different, the adaptation performance will heavily decline. In this paper, we propose a new sentiment domain adaptation approach by adapting the sentiment knowledge in general-purpose sentiment lexicons to a specific domain. Since the general sentiment words of general-purpose sentiment lexicons usually convey consistent sentiments in different domains, they have better generalization performance than the sentiment classifier trained in a source domain. In addition, we propose to extract various kinds of contextual sentiment knowledge from massive unlabeled samples in target domain and formulate them as sentiment relations among sentiment expressions. It can propagate the sentiment information in general sentiment words to massive domain-specific sentiment expressions. Besides, we propose a unified framework to incorporate these different kinds of sentiment knowledge and learn an accurate domain-specific sentiment classifier for target domain. Moreover, we propose an efficient optimization algorithm to solve the model of our approach. Extensive experiments on benchmark datasets validate the effectiveness and efficiency of our approach.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114291419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Task-specific Short Document Expansion","authors":"Ramakrishna Bairi, Raghavendra Udupa, Ganesh Ramakrishnan","doi":"10.1145/2983323.2983811","DOIUrl":"https://doi.org/10.1145/2983323.2983811","url":null,"abstract":"Collections that contain a large number of short texts are becoming increasingly common (eg., tweets, reviews, etc). Analytical tasks (such as classification, clustering, etc.) involving short texts could be challenging due to the lack of context and owing to their sparseness. An often encountered problem is low accuracy on the task. A standard technique used in the handling of short texts is expanding them before subjecting them to the task. However, existing works on short text expansion suffer from certain limitations: (i) they depend on domain knowledge to expand the text; (ii) they employ task-specific heuristics; and (iii) the expansion procedure is tightly coupled to the task. This makes it hard to adapt a procedure, designed for one task, into another. We present an expansion technique -- TIDE (Task-specIfic short Document Expansion) -- that can be applied on several Machine Learning, NLP and Information Retrieval tasks on short texts (such as short text classification, clustering, entity disambiguation, and the like) without using task specific heuristics and domain-specific knowledge for expansion. At the same time, our technique is capable of learning to expand short texts in a task-specific way. That is, the same technique that is applied to expand a short text in two different tasks is able to learn to produce different expansions depending upon what expansion benefits the task's performance. To speed up the learning process, we also introduce a technique called block learning. Our experiments with classification and clustering tasks show that our framework improves upon several baselines according to the standard evaluation metrics which includes the accuracy and normalized mutual information (NMI).","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125104173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hancheng Ge, James Caverlee, N. Zhang, A. Squicciarini
{"title":"Uncovering the Spatio-Temporal Dynamics of Memes in the Presence of Incomplete Information","authors":"Hancheng Ge, James Caverlee, N. Zhang, A. Squicciarini","doi":"10.1145/2983323.2983782","DOIUrl":"https://doi.org/10.1145/2983323.2983782","url":null,"abstract":"Modeling, understanding, and predicting the spatio-temporal dynamics of online memes are important tasks, with ramifications on location-based services, social media search, targeted advertising and content delivery networks. However, the raw data revealing these dynamics are often incomplete and error-prone; for example, API limitations and data sampling policies can lead to an incomplete (and often biased) perspective on these dynamics. Hence, in this paper, we investigate new methods for uncovering the full (underlying) distribution through a novel spatio-temporal dynamics recovery framework which models the latent relationships among locations, memes, and times. By integrating these hidden relationships into a tensor-based recovery framework -- called AirCP -- we find that high-quality models of meme spread can be built with access to only a fraction of the full data. Experimental results on both synthetic and real-world Twitter hashtag data demonstrate the promising performance of the proposed framework: an average improvement of over 27% in recovering the spatio-temporal dynamics of hashtags versus five state-of-the-art alternatives.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122811477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junhua Fang, Rong Zhang, Xiaotong Wang, T. Fu, Zhenjie Zhang, Aoying Zhou
{"title":"Cost-Effective Stream Join Algorithm on Cloud System","authors":"Junhua Fang, Rong Zhang, Xiaotong Wang, T. Fu, Zhenjie Zhang, Aoying Zhou","doi":"10.1145/2983323.2983773","DOIUrl":"https://doi.org/10.1145/2983323.2983773","url":null,"abstract":"Matrix-based scheme (Join-Matrix) can prefectly support distributed stream joins, especially for arbitrary join predicates, because it guarantees any tuples from two streams to meet with each other. However,the dynamics and unpredictability features of stream require quick actions on scheme changing. Otherwise, they may lead to degradation of system throughputs and increament of processing latency with the waste of system resources, such as CPUs and Memories. Since Join-Matrix model has the fixed processing architecture with replicated data, these kinds of adverseness will be magnified. Therefore, it is urgent to find a solution that preserves advantages of Join-Matrix model and promises a good usage to computation resources when it meets scheme changing. In this paper, we propose a cost-effective stream join algorithm, which ensures the adaptability of Join-Matrix but with lower resources consumption. Specifically, a varietal matrix generation algorithm is proposed to generate an irregular matrix scheme for assigning the minimal number of tasks; a lightweight migration algorithm is designed to ensure state migration at a low cost; a complete load balance process framework is described to guarantee the correctness during the scheme changing. We conduct extensive experiments to compare our method with baseline systems on both benchmarks and real-workloads, and explain the results in detail.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127003276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble of Anchor Adapters for Transfer Learning","authors":"Fuzhen Zhuang, Ping Luo, Sinno Jialin Pan, Hui Xiong, Qing He","doi":"10.1145/2983323.2983690","DOIUrl":"https://doi.org/10.1145/2983323.2983690","url":null,"abstract":"In the past decade, there have been a large number of transfer learning algorithms proposed for various real-world applications. However, most of them are vulnerable to negative transfer since their performance is even worse than traditional supervised models. Aiming at more robust transfer learning models, we propose an ENsemble framework of anCHOR adapters (ENCHOR for short), in which an anchor adapter adapts the features of instances based on their similarities to a specific anchor (i.e., a selected instance). Specifically, the more similar to the anchor instance, the higher degree of the original feature of an instance remains unchanged in the adapted representation, and vice versa. This adapted representation for the data actually expresses the local structure around the corresponding anchor, and then any transfer learning method can be applied to this adapted representation for a prediction model, which focuses more on the neighborhood of the anchor. Next, based on multiple anchors, multiple anchor adapters can be built and combined into an ensemble for final output. Additionally, we develop an effective measure to select the anchors for ensemble building to achieve further performance improvement. Extensive experiments on hundreds of text classification tasks are conducted to demonstrate the effectiveness of ENCHOR. The results show that: when traditional supervised models perform poorly, ENCHOR (based on only 8 selected anchors) achieves $6%-13%$ increase in terms of average accuracy compared with the state-of-the-art methods, and it greatly alleviates negative transfer.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133204957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raíza Hanada, M. G. Pimentel, Marco Cristo, Fernando Anglada Lores
{"title":"Effective Spelling Correction for Eye-based Typing using domain-specific Information about Error Distribution","authors":"Raíza Hanada, M. G. Pimentel, Marco Cristo, Fernando Anglada Lores","doi":"10.1145/2983323.2983838","DOIUrl":"https://doi.org/10.1145/2983323.2983838","url":null,"abstract":"Spelling correction methods, widely used and researched, usually assume a low error probability and a small number of errors per word. These assumptions do not hold in very noisy input scenarios such as eye-based typing systems. In particular for eye typing, insertion errors are much more common than in traditional input systems, due to specific sources of noise such as the eye tracker device, particular user behaviors, and intrinsic characteristics of eye movements. The large number of common errors in such a scenario makes the use of traditional approaches unfeasible. Moreover, the lack of a large corpus of errors makes it hard to adopt probabilistic approaches based on information extracted from real world data. We address these problems by combining estimates extracted from general error corpora with domain-specific knowledge about eye-based input. Further, by relaxing restrictions on edit distance specifically related to insertion errors, we propose an algorithm that is able to find dictionary word candidates in an attainable time. We show that our method achieves good results to rank the correct word, given the input stream and similar space and time restrictions, when compared to the state-of-the-art baselines.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sameen Mansha, F. Kamiran, Asim Karim, Aizaz Anwar
{"title":"A Self-Organizing Map for Identifying InfluentialCommunities in Speech-based Networks","authors":"Sameen Mansha, F. Kamiran, Asim Karim, Aizaz Anwar","doi":"10.1145/2983323.2983885","DOIUrl":"https://doi.org/10.1145/2983323.2983885","url":null,"abstract":"Low-literate people are unable to use many mainstream social networks due to their text-based interfaces even though they constitute a major portion of the world population. Specialized speech-based networks (SBNs) are more accessible to low-literate users through their simple speech-based interfaces. While SBNs have the potential for providing value-adding services to a large segment of society they have been hampered by the need to operate in low-income segments on low budgets. The knowledge of influential users and communities in such networks can help in optimizing their operations. In this paper, we present a self-organizing map (SOM) for discovering and visualizing influential communities of users in SBNs. We demonstrate how a friendship graph is formed from call data records and present a method for estimating influences between users. Subsequently, we develop a SOM to cluster users based on their influence, thus identifying community-level influences and their roles in information propagation. We test our approach on Polly, a SBN developed for job ads dissemination among low-literate users. For comparison, we identify influential users with the benchmark greedy algorithm and relate them to the discovered communities. The results show that influential users are concentrated in influential communities and community-level information propagation provides a ready summary of influential users.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133854990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EnerQuery: Energy-Aware Query Processing","authors":"Amine Roukh, Ladjel Bellatreche, C. Ordonez","doi":"10.1145/2983323.2983334","DOIUrl":"https://doi.org/10.1145/2983323.2983334","url":null,"abstract":"Energy consumption is increasingly more important in large-scale query processing. This problem requires revisiting traditional query processing in actual DBMSs to identify the potential of energy saving, and to study the trade-offs between energy consumption and performance. In this paper, we propose EnerQuery, a tool built on top of a traditional DBMS to capitalize the efforts invested in building energy-aware query optimizers, which have the lion's share in energy consumption. Energy consumption is estimated on all query plan steps and integrated into a mathematical linear cost model used to select the best query plans. To increase end users' energy awareness, EnerQuery features a diagnostic GUI to visualize energy consumption per step and its savings when tuning key parameters during query execution.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130372353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}