Proceedings of the 21st ACM international conference on Information and knowledge management最新文献_第2页

Improving document clustering using automated machine translation 使用自动机器翻译改进文档聚类

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2396844

Xiang Wang, B. Qian, I. Davidson

{"title":"Improving document clustering using automated machine translation","authors":"Xiang Wang, B. Qian, I. Davidson","doi":"10.1145/2396761.2396844","DOIUrl":"https://doi.org/10.1145/2396761.2396844","url":null,"abstract":"With the development of statistical machine translation, we have ready-to-use tools that can translate documents from one language to many other languages. These translations provide different yet correlated views of the same set of documents. This gives rise to an intriguing question: can we use the extra information to achieve a better clustering of the documents? Some recent work on multiview clustering provided positive answers to this question. In this work, we propose an alternative approach to address this problem using the constrained clustering framework. Unlike traditional Must-Link and Cannot-Link constraints, the constraints generated from machine translation are dense yet noisy. We show how to incorporate this type of constraints by presenting two algorithms, one parametric and one non-parametric. Our algorithms are easy to implement, efficient, and can consistently improve the clustering of real data, namely the Reuters RCV1/RCV2 Multilingual Dataset. In contrast to existing multiview clustering algorithms, our technique does not need the compatibility or the conditional independence assumption, nor does it involve subtle parameter tuning.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116247776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

MOUNA: mining opinions to unveil neglected arguments 挖掘观点，揭示被忽视的论点

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398739

Mouna Kacimi, J. Gamper

{"title":"MOUNA: mining opinions to unveil neglected arguments","authors":"Mouna Kacimi, J. Gamper","doi":"10.1145/2396761.2398739","DOIUrl":"https://doi.org/10.1145/2396761.2398739","url":null,"abstract":"A query topic can be subjective involving a variety of opinions, judgments, arguments, and many other debatable aspects. Typically, search engines process queries independently from the nature of their topics using a relevance-based retrieval strategy. Hence, search results about subjective topics are often biased towards a specific view point or version. In this demo, we shall present MOUNA, a novel approach for opinion diversification. Given a query on a subjective topic, MOUNA ranks search results based on three scores: (1) relevance of documents, (2) semantic diversity to avoid redundancy and capture the different arguments used to discuss the query topic, and (3) sentiment diversity to cover a balanced set of documents having positive, negative, and neutral sentiments about the query topic. Moreover, MOUNA enhances the representation of search results with a summary of the different arguments and sentiments related to the query topic. Thus, the user can navigate through the results and explore the links between them. We provide an example scenario in this demonstration to illustrate the inadequacy of relevance-based techniques for searching subjective topics and highlight the innovative aspects of MOUNA. A video showing the demo can be found in http://www.youtube.com/user/mounakacimi/videos .","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116563086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Concavity in IR models 红外模型的凹凸性

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398686

S. Clinchant

引用次数: 2

Indexing uncertain spatio-temporal data 索引不确定的时空数据

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2396813

Tobias Emrich, H. Kriegel, N. Mamoulis, M. Renz, Andreas Züfle

引用次数: 32

A tag-centric discriminative model for web objects classification 一个以标签为中心的web对象分类判别模型

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398612

Lina Yao, Quan Z. Sheng

引用次数: 1

Unsupervised discovery of opposing opinion networks from forum discussions 从论坛讨论中无监督地发现反对意见网络

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398489

Yue Lu, Hongning Wang, ChengXiang Zhai, D. Roth

引用次数: 36

CGStream: continuous correlated graph query for data streams CGStream:数据流的连续关联图查询

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398419

Shirui Pan, Xingquan Zhu

{"title":"CGStream: continuous correlated graph query for data streams","authors":"Shirui Pan, Xingquan Zhu","doi":"10.1145/2396761.2398419","DOIUrl":"https://doi.org/10.1145/2396761.2398419","url":null,"abstract":"In this paper, we propose to query correlated graph in a data stream scenario, where given a query graph q an algorithm is required to retrieve all the subgraphs whose Pearson's correlation coefficients with q are greater than a threshold Θ over some graph data flowing in a stream fashion. Due to the dynamic changing nature of the stream data and the inherent complexity of the graph query process, treating graph streams as static datasets is computationally infeasible or ineffective. In the paper, we propose a novel algorithm, CGStream, to identify correlated graphs from data stream, by using a sliding window which covers a number of consecutive batches of stream data records. Our theme is to regard stream query as the traversing along a data stream and the query is achieved at a number of outlooks over the data stream. For each outlook, we derive a lower frequency bound to mine a set of frequent subgraph candidates, where the lower bound guarantees that no pattern is missing from the current outlook to the next outlook. On top of that, we derive an upper correlation bound and a heuristic rule to prune the candidate size, which helps reduce the computation cost at each outlook. Experimental results demonstrate that the proposed algorithm is several times, or even an order of magnitude, more efficient than the straightforward algorithm. Meanwhile, our algorithm achieves good performance in terms of query precision.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122593604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

What is the IQ of your data transformation system? 你的数据转换系统的IQ是多少?

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2396872

G. Mecca, Paolo Papotti, Salvatore Raunich, Donatello Santoro

引用次数: 17

From sBoW to dCoT marginalized encoders for text representation 从sBoW到dCoT，用于文本表示的边缘编码器

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398536

Z. Xu, Minmin Chen, Kilian Q. Weinberger, Fei Sha

{"title":"From sBoW to dCoT marginalized encoders for text representation","authors":"Z. Xu, Minmin Chen, Kilian Q. Weinberger, Fei Sha","doi":"10.1145/2396761.2398536","DOIUrl":"https://doi.org/10.1145/2396761.2398536","url":null,"abstract":"In text mining, information retrieval, and machine learning, text documents are commonly represented through variants of sparse Bag of Words (sBoW) vectors (e.g. TF-IDF [1]). Although simple and intuitive, sBoW style representations suffer from their inherent over-sparsity and fail to capture word-level synonymy and polysemy. Especially when labeled data is limited (e.g. in document classification), or the text documents are short (e.g. emails or abstracts), many features are rarely observed within the training corpus. This leads to overfitting and reduced generalization accuracy. In this paper we propose Dense Cohort of Terms (dCoT), an unsupervised algorithm to learn improved sBoW document features. dCoT explicitly models absent words by removing and reconstructing random sub-sets of words in the unlabeled corpus. With this approach, dCoT learns to reconstruct frequent words from co-occurring infrequent words and maps the high dimensional sparse sBoW vectors into a low-dimensional dense representation. We show that the feature removal can be marginalized out and that the reconstruction can be solved for in closed-form. We demonstrate empirically, on several benchmark datasets, that dCoT features significantly improve the classification accuracy across several document classification tasks.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123033017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

CrowdTiles: presenting crowd-based information for event-driven information needs CrowdTiles:呈现基于人群的信息以满足事件驱动的信息需求

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398731

S. Whiting, K. Zhou, J. Jose, Omar Alonso, Teerapong Leelanupab

{"title":"CrowdTiles: presenting crowd-based information for event-driven information needs","authors":"S. Whiting, K. Zhou, J. Jose, Omar Alonso, Teerapong Leelanupab","doi":"10.1145/2396761.2398731","DOIUrl":"https://doi.org/10.1145/2396761.2398731","url":null,"abstract":"Time plays a central role in many web search information needs relating to recent events. For recency queries where fresh information is most desirable, there is likely to be a great deal of highly-relevant information created very recently by crowds of people across the world, particularly on platforms such as Wikipedia and Twitter. With so many users, mainstream events are often very quickly reflected in these sources. The English Wikipedia encyclopedia consists of a vast collection of user-edited articles covering a range of topics. During events, users collaboratively create and edit existing articles in near real-time. Simultaneously, users on Twitter disseminate and discuss event details, with a small number of users becoming influential for the topic. In this demo, we propose a novel approach to presenting a summary of new information and users related to recent or ongoing events associated with the user's search topic, therefore aiding most recent information discovery. We outline methods to detect search topics which are driven by events, identify and extract changing Wikipedia article passages and find influential Twitter users. Using these, we provide a system which displays familiar tiles in search results to present recent changes in the event-related Wikipedia articles, as well as Twitter users who have tweeted recent relevant information about the event topics.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114060637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11