Proceedings of the 25th ACM International on Conference on Information and Knowledge Management最新文献_第8页

TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding TweetSift:基于实体知识库和主题增强词嵌入的Tweet主题分类

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983325

Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, Rui Fang

引用次数: 39

Privacy-Preserving Reachability Query Services for Massive Networks 大规模网络中保护隐私的可达性查询服务

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983799

Jiaxin Jiang, Peipei Yi, Byron Choi, Zhiwei Zhang, Xiaohui Yu

{"title":"Privacy-Preserving Reachability Query Services for Massive Networks","authors":"Jiaxin Jiang, Peipei Yi, Byron Choi, Zhiwei Zhang, Xiaohui Yu","doi":"10.1145/2983323.2983799","DOIUrl":"https://doi.org/10.1145/2983323.2983799","url":null,"abstract":"This paper studies privacy-preserving reachability query services under the paradigm of data outsourcing. Specifically, graph data have been outsourced to a third-party service provider (SP), query clients submit their queries to the (SP), and the (SP) returns the query answers to the clients. However, the (SP) may not always be trustworthy. Hence, this paper investigates protecting the structural information of the graph data and the query answers from the (SP). Existing techniques are either insecure or not scalable. This paper proposes a privacy-preserving labeling, called ppTopo. To our knowledge, ppTopo is the first work that can produce reachability index on massive networks and is secure against known plaintext attacks (KPA). Specifically, we propose a scalable index construction algorithm by employing the idea of topological folding, recently proposed by Cheng et al. We propose a novel asymmetric scalar product encryption in modulo 3 (ASPE3). It allows us to encrypt the index labels and transforms the queries into scalar products of encrypted labels. We perform an experimental study of the proposed technique on the SNAP networks. Compared with the existing methods, our results show that our technique is capable of producing the encrypted indexes at least 5 times faster for massive networks and the client's decryption time is 2-3 times smaller for most graphs.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121806205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Sentiment Domain Adaptation with Multi-Level Contextual Sentiment Knowledge 基于多层次语境情感知识的情感域自适应

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983851

Fangzhao Wu, Sixing Wu, Yongfeng Huang, Songfang Huang, Yong Qin

{"title":"Sentiment Domain Adaptation with Multi-Level Contextual Sentiment Knowledge","authors":"Fangzhao Wu, Sixing Wu, Yongfeng Huang, Songfang Huang, Yong Qin","doi":"10.1145/2983323.2983851","DOIUrl":"https://doi.org/10.1145/2983323.2983851","url":null,"abstract":"Sentiment domain adaptation is widely studied to tackle the domain-dependence problem in sentiment analysis field. Existing domain adaptation methods usually train a sentiment classifier in a source domain and adapt it to the target domain using transfer learning techniques. However, when the sentiment feature distributions of the source and target domains are significantly different, the adaptation performance will heavily decline. In this paper, we propose a new sentiment domain adaptation approach by adapting the sentiment knowledge in general-purpose sentiment lexicons to a specific domain. Since the general sentiment words of general-purpose sentiment lexicons usually convey consistent sentiments in different domains, they have better generalization performance than the sentiment classifier trained in a source domain. In addition, we propose to extract various kinds of contextual sentiment knowledge from massive unlabeled samples in target domain and formulate them as sentiment relations among sentiment expressions. It can propagate the sentiment information in general sentiment words to massive domain-specific sentiment expressions. Besides, we propose a unified framework to incorporate these different kinds of sentiment knowledge and learn an accurate domain-specific sentiment classifier for target domain. Moreover, we propose an efficient optimization algorithm to solve the model of our approach. Extensive experiments on benchmark datasets validate the effectiveness and efficiency of our approach.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114291419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A Framework for Task-specific Short Document Expansion 特定任务的短文档扩展框架

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983811

Ramakrishna Bairi, Raghavendra Udupa, Ganesh Ramakrishnan

{"title":"A Framework for Task-specific Short Document Expansion","authors":"Ramakrishna Bairi, Raghavendra Udupa, Ganesh Ramakrishnan","doi":"10.1145/2983323.2983811","DOIUrl":"https://doi.org/10.1145/2983323.2983811","url":null,"abstract":"Collections that contain a large number of short texts are becoming increasingly common (eg., tweets, reviews, etc). Analytical tasks (such as classification, clustering, etc.) involving short texts could be challenging due to the lack of context and owing to their sparseness. An often encountered problem is low accuracy on the task. A standard technique used in the handling of short texts is expanding them before subjecting them to the task. However, existing works on short text expansion suffer from certain limitations: (i) they depend on domain knowledge to expand the text; (ii) they employ task-specific heuristics; and (iii) the expansion procedure is tightly coupled to the task. This makes it hard to adapt a procedure, designed for one task, into another. We present an expansion technique -- TIDE (Task-specIfic short Document Expansion) -- that can be applied on several Machine Learning, NLP and Information Retrieval tasks on short texts (such as short text classification, clustering, entity disambiguation, and the like) without using task specific heuristics and domain-specific knowledge for expansion. At the same time, our technique is capable of learning to expand short texts in a task-specific way. That is, the same technique that is applied to expand a short text in two different tasks is able to learn to produce different expansions depending upon what expansion benefits the task's performance. To speed up the learning process, we also introduce a technique called block learning. Our experiments with classification and clustering tasks show that our framework improves upon several baselines according to the standard evaluation metrics which includes the accuracy and normalized mutual information (NMI).","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125104173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Uncovering the Spatio-Temporal Dynamics of Memes in the Presence of Incomplete Information 不完全信息下模因时空动态的揭示

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983782

Hancheng Ge, James Caverlee, N. Zhang, A. Squicciarini

{"title":"Uncovering the Spatio-Temporal Dynamics of Memes in the Presence of Incomplete Information","authors":"Hancheng Ge, James Caverlee, N. Zhang, A. Squicciarini","doi":"10.1145/2983323.2983782","DOIUrl":"https://doi.org/10.1145/2983323.2983782","url":null,"abstract":"Modeling, understanding, and predicting the spatio-temporal dynamics of online memes are important tasks, with ramifications on location-based services, social media search, targeted advertising and content delivery networks. However, the raw data revealing these dynamics are often incomplete and error-prone; for example, API limitations and data sampling policies can lead to an incomplete (and often biased) perspective on these dynamics. Hence, in this paper, we investigate new methods for uncovering the full (underlying) distribution through a novel spatio-temporal dynamics recovery framework which models the latent relationships among locations, memes, and times. By integrating these hidden relationships into a tensor-based recovery framework -- called AirCP -- we find that high-quality models of meme spread can be built with access to only a fraction of the full data. Experimental results on both synthetic and real-world Twitter hashtag data demonstrate the promising performance of the proposed framework: an average improvement of over 27% in recovering the spatio-temporal dynamics of hashtags versus five state-of-the-art alternatives.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122811477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Cost-Effective Stream Join Algorithm on Cloud System 云系统上的高性价比流连接算法

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983773

Junhua Fang, Rong Zhang, Xiaotong Wang, T. Fu, Zhenjie Zhang, Aoying Zhou

{"title":"Cost-Effective Stream Join Algorithm on Cloud System","authors":"Junhua Fang, Rong Zhang, Xiaotong Wang, T. Fu, Zhenjie Zhang, Aoying Zhou","doi":"10.1145/2983323.2983773","DOIUrl":"https://doi.org/10.1145/2983323.2983773","url":null,"abstract":"Matrix-based scheme (Join-Matrix) can prefectly support distributed stream joins, especially for arbitrary join predicates, because it guarantees any tuples from two streams to meet with each other. However,the dynamics and unpredictability features of stream require quick actions on scheme changing. Otherwise, they may lead to degradation of system throughputs and increament of processing latency with the waste of system resources, such as CPUs and Memories. Since Join-Matrix model has the fixed processing architecture with replicated data, these kinds of adverseness will be magnified. Therefore, it is urgent to find a solution that preserves advantages of Join-Matrix model and promises a good usage to computation resources when it meets scheme changing. In this paper, we propose a cost-effective stream join algorithm, which ensures the adaptability of Join-Matrix but with lower resources consumption. Specifically, a varietal matrix generation algorithm is proposed to generate an irregular matrix scheme for assigning the minimal number of tasks; a lightweight migration algorithm is designed to ensure state migration at a low cost; a complete load balance process framework is described to guarantee the correctness during the scheme changing. We conduct extensive experiments to compare our method with baseline systems on both benchmarks and real-workloads, and explain the results in detail.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127003276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Ensemble of Anchor Adapters for Transfer Learning 用于迁移学习的锚点适配器集成

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983690

Fuzhen Zhuang, Ping Luo, Sinno Jialin Pan, Hui Xiong, Qing He

{"title":"Ensemble of Anchor Adapters for Transfer Learning","authors":"Fuzhen Zhuang, Ping Luo, Sinno Jialin Pan, Hui Xiong, Qing He","doi":"10.1145/2983323.2983690","DOIUrl":"https://doi.org/10.1145/2983323.2983690","url":null,"abstract":"In the past decade, there have been a large number of transfer learning algorithms proposed for various real-world applications. However, most of them are vulnerable to negative transfer since their performance is even worse than traditional supervised models. Aiming at more robust transfer learning models, we propose an ENsemble framework of anCHOR adapters (ENCHOR for short), in which an anchor adapter adapts the features of instances based on their similarities to a specific anchor (i.e., a selected instance). Specifically, the more similar to the anchor instance, the higher degree of the original feature of an instance remains unchanged in the adapted representation, and vice versa. This adapted representation for the data actually expresses the local structure around the corresponding anchor, and then any transfer learning method can be applied to this adapted representation for a prediction model, which focuses more on the neighborhood of the anchor. Next, based on multiple anchors, multiple anchor adapters can be built and combined into an ensemble for final output. Additionally, we develop an effective measure to select the anchors for ensemble building to achieve further performance improvement. Extensive experiments on hundreds of text classification tasks are conducted to demonstrate the effectiveness of ENCHOR. The results show that: when traditional supervised models perform poorly, ENCHOR (based on only 8 selected anchors) achieves $6%-13%$ increase in terms of average accuracy compared with the state-of-the-art methods, and it greatly alleviates negative transfer.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133204957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Effective Spelling Correction for Eye-based Typing using domain-specific Information about Error Distribution 使用有关错误分布的领域特定信息对基于眼睛的打字进行有效的拼写校正

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983838

Raíza Hanada, M. G. Pimentel, Marco Cristo, Fernando Anglada Lores

{"title":"Effective Spelling Correction for Eye-based Typing using domain-specific Information about Error Distribution","authors":"Raíza Hanada, M. G. Pimentel, Marco Cristo, Fernando Anglada Lores","doi":"10.1145/2983323.2983838","DOIUrl":"https://doi.org/10.1145/2983323.2983838","url":null,"abstract":"Spelling correction methods, widely used and researched, usually assume a low error probability and a small number of errors per word. These assumptions do not hold in very noisy input scenarios such as eye-based typing systems. In particular for eye typing, insertion errors are much more common than in traditional input systems, due to specific sources of noise such as the eye tracker device, particular user behaviors, and intrinsic characteristics of eye movements. The large number of common errors in such a scenario makes the use of traditional approaches unfeasible. Moreover, the lack of a large corpus of errors makes it hard to adopt probabilistic approaches based on information extracted from real world data. We address these problems by combining estimates extracted from general error corpora with domain-specific knowledge about eye-based input. Further, by relaxing restrictions on edit distance specifically related to insertion errors, we propose an algorithm that is able to find dictionary word candidates in an attainable time. We show that our method achieves good results to rank the correct word, given the input stream and similar space and time restrictions, when compared to the state-of-the-art baselines.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Self-Organizing Map for Identifying InfluentialCommunities in Speech-based Networks 基于语音的网络中识别有影响力社区的自组织地图

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983885

Sameen Mansha, F. Kamiran, Asim Karim, Aizaz Anwar

{"title":"A Self-Organizing Map for Identifying InfluentialCommunities in Speech-based Networks","authors":"Sameen Mansha, F. Kamiran, Asim Karim, Aizaz Anwar","doi":"10.1145/2983323.2983885","DOIUrl":"https://doi.org/10.1145/2983323.2983885","url":null,"abstract":"Low-literate people are unable to use many mainstream social networks due to their text-based interfaces even though they constitute a major portion of the world population. Specialized speech-based networks (SBNs) are more accessible to low-literate users through their simple speech-based interfaces. While SBNs have the potential for providing value-adding services to a large segment of society they have been hampered by the need to operate in low-income segments on low budgets. The knowledge of influential users and communities in such networks can help in optimizing their operations. In this paper, we present a self-organizing map (SOM) for discovering and visualizing influential communities of users in SBNs. We demonstrate how a friendship graph is formed from call data records and present a method for estimating influences between users. Subsequently, we develop a SOM to cluster users based on their influence, thus identifying community-level influences and their roles in information propagation. We test our approach on Polly, a SBN developed for job ads dissemination among low-literate users. For comparison, we identify influential users with the benchmark greedy algorithm and relate them to the discovered communities. The results show that influential users are concentrated in influential communities and community-level information propagation provides a ready summary of influential users.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133854990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

EnerQuery: Energy-Aware Query Processing EnerQuery:能量感知查询处理

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI: 10.1145/2983323.2983334

Amine Roukh, Ladjel Bellatreche, C. Ordonez

引用次数: 16