Proceedings of the 21st ACM international conference on Information and knowledge management最新文献

筛选
英文 中文
Improving document clustering using automated machine translation 使用自动机器翻译改进文档聚类
Xiang Wang, B. Qian, I. Davidson
{"title":"Improving document clustering using automated machine translation","authors":"Xiang Wang, B. Qian, I. Davidson","doi":"10.1145/2396761.2396844","DOIUrl":"https://doi.org/10.1145/2396761.2396844","url":null,"abstract":"With the development of statistical machine translation, we have ready-to-use tools that can translate documents from one language to many other languages. These translations provide different yet correlated views of the same set of documents. This gives rise to an intriguing question: can we use the extra information to achieve a better clustering of the documents? Some recent work on multiview clustering provided positive answers to this question. In this work, we propose an alternative approach to address this problem using the constrained clustering framework. Unlike traditional Must-Link and Cannot-Link constraints, the constraints generated from machine translation are dense yet noisy. We show how to incorporate this type of constraints by presenting two algorithms, one parametric and one non-parametric. Our algorithms are easy to implement, efficient, and can consistently improve the clustering of real data, namely the Reuters RCV1/RCV2 Multilingual Dataset. In contrast to existing multiview clustering algorithms, our technique does not need the compatibility or the conditional independence assumption, nor does it involve subtle parameter tuning.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116247776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
MOUNA: mining opinions to unveil neglected arguments 挖掘观点,揭示被忽视的论点
Mouna Kacimi, J. Gamper
{"title":"MOUNA: mining opinions to unveil neglected arguments","authors":"Mouna Kacimi, J. Gamper","doi":"10.1145/2396761.2398739","DOIUrl":"https://doi.org/10.1145/2396761.2398739","url":null,"abstract":"A query topic can be subjective involving a variety of opinions, judgments, arguments, and many other debatable aspects. Typically, search engines process queries independently from the nature of their topics using a relevance-based retrieval strategy. Hence, search results about subjective topics are often biased towards a specific view point or version. In this demo, we shall present MOUNA, a novel approach for opinion diversification. Given a query on a subjective topic, MOUNA ranks search results based on three scores: (1) relevance of documents, (2) semantic diversity to avoid redundancy and capture the different arguments used to discuss the query topic, and (3) sentiment diversity to cover a balanced set of documents having positive, negative, and neutral sentiments about the query topic. Moreover, MOUNA enhances the representation of search results with a summary of the different arguments and sentiments related to the query topic. Thus, the user can navigate through the results and explore the links between them. We provide an example scenario in this demonstration to illustrate the inadequacy of relevance-based techniques for searching subjective topics and highlight the innovative aspects of MOUNA. A video showing the demo can be found in http://www.youtube.com/user/mounakacimi/videos .","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116563086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Concavity in IR models 红外模型的凹凸性
S. Clinchant
{"title":"Concavity in IR models","authors":"S. Clinchant","doi":"10.1145/2396761.2398686","DOIUrl":"https://doi.org/10.1145/2396761.2398686","url":null,"abstract":"We study the impact of concavity in IR models and propose to use a generalized logarithm function, the n-logarithm to weight words in documents. We extend the family of information based Information Retrieval (IR) models with this function. We show that that concavity is indeed an important property of IR models. Experiments conducted for IR tasks, Latent Semantic Indexing and Text Categorization show improvements.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122309732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Indexing uncertain spatio-temporal data 索引不确定的时空数据
Tobias Emrich, H. Kriegel, N. Mamoulis, M. Renz, Andreas Züfle
{"title":"Indexing uncertain spatio-temporal data","authors":"Tobias Emrich, H. Kriegel, N. Mamoulis, M. Renz, Andreas Züfle","doi":"10.1145/2396761.2396813","DOIUrl":"https://doi.org/10.1145/2396761.2396813","url":null,"abstract":"The advances in sensing and telecommunication technologies allow the collection and management of vast amounts of spatio-temporal data combining location and time information.Due to physical and resource limitations of data collection devices (e.g., RFID readers, GPS receivers and other sensors) data are typically collected only at discrete points of time. In-between these discrete time instances, the positions of tracked moving objects are uncertain. In this work, we propose novel approximation techniques in order to probabilistically bound the uncertain movement of objects; these techniques allow for efficient and effective filtering during query evaluation using an hierarchical index structure.To the best of our knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics. We experimentally show that it accelerates the existing, scan-based approach by orders of magnitude.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122567383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
A tag-centric discriminative model for web objects classification 一个以标签为中心的web对象分类判别模型
Lina Yao, Quan Z. Sheng
{"title":"A tag-centric discriminative model for web objects classification","authors":"Lina Yao, Quan Z. Sheng","doi":"10.1145/2396761.2398612","DOIUrl":"https://doi.org/10.1145/2396761.2398612","url":null,"abstract":"This paper studies web object classification problem with the novel exploration of social tags. More and more web objects are increasingly annotated with human interpretable labels (i.e., tags), which can be considered as an auxiliary attribute to assist the object classification. Automatically classifying web objects into manageable semantic categories has long been a fundamental pre-process for indexing, browsing, searching, and mining heterogeneous web objects. However, such heterogeneous web objects often suffer from a lack of easy-extractable and uniform descriptive features. In this paper, we propose a discriminative tag-centric model for web object classification by jointly modeling the objects category labels and their corresponding social tags and un-coding the relevance among social tags. Our approach is based on recent techniques for learning large-scale discriminative models. We conduct experiments to validate our approach using real-life data. The results show the feasibility and good performance of our approach.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122688216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unsupervised discovery of opposing opinion networks from forum discussions 从论坛讨论中无监督地发现反对意见网络
Yue Lu, Hongning Wang, ChengXiang Zhai, D. Roth
{"title":"Unsupervised discovery of opposing opinion networks from forum discussions","authors":"Yue Lu, Hongning Wang, ChengXiang Zhai, D. Roth","doi":"10.1145/2396761.2398489","DOIUrl":"https://doi.org/10.1145/2396761.2398489","url":null,"abstract":"With more and more people freely express opinions as well as actively interact with each other in discussion threads, online forums are becoming a gold mine with rich information about people's opinions and social behaviors. In this paper, we study an interesting new problem of automatically discovering opposing opinion networks of users from forum discussions, which are subset of users who are strongly against each other on some topic. Toward this goal, we propose to use signals from both textual content (e.g., who says what) and social interactions (e.g., who talks to whom) which are both abundant in online forums. We also design an optimization formulation to combine all the signals in an unsupervised way. We created a data set by manually annotating forum data on five controversial topics and our experimental results show that the proposed optimization method outperforms several baselines and existing approaches, demonstrating the power of combining both text analysis and social network analysis in analyzing and generating the opposing opinion networks.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122509966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
CGStream: continuous correlated graph query for data streams CGStream:数据流的连续关联图查询
Shirui Pan, Xingquan Zhu
{"title":"CGStream: continuous correlated graph query for data streams","authors":"Shirui Pan, Xingquan Zhu","doi":"10.1145/2396761.2398419","DOIUrl":"https://doi.org/10.1145/2396761.2398419","url":null,"abstract":"In this paper, we propose to query correlated graph in a data stream scenario, where given a query graph q an algorithm is required to retrieve all the subgraphs whose Pearson's correlation coefficients with q are greater than a threshold Θ over some graph data flowing in a stream fashion. Due to the dynamic changing nature of the stream data and the inherent complexity of the graph query process, treating graph streams as static datasets is computationally infeasible or ineffective. In the paper, we propose a novel algorithm, CGStream, to identify correlated graphs from data stream, by using a sliding window which covers a number of consecutive batches of stream data records. Our theme is to regard stream query as the traversing along a data stream and the query is achieved at a number of outlooks over the data stream. For each outlook, we derive a lower frequency bound to mine a set of frequent subgraph candidates, where the lower bound guarantees that no pattern is missing from the current outlook to the next outlook. On top of that, we derive an upper correlation bound and a heuristic rule to prune the candidate size, which helps reduce the computation cost at each outlook. Experimental results demonstrate that the proposed algorithm is several times, or even an order of magnitude, more efficient than the straightforward algorithm. Meanwhile, our algorithm achieves good performance in terms of query precision.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122593604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
What is the IQ of your data transformation system? 你的数据转换系统的IQ是多少?
G. Mecca, Paolo Papotti, Salvatore Raunich, Donatello Santoro
{"title":"What is the IQ of your data transformation system?","authors":"G. Mecca, Paolo Papotti, Salvatore Raunich, Donatello Santoro","doi":"10.1145/2396761.2396872","DOIUrl":"https://doi.org/10.1145/2396761.2396872","url":null,"abstract":"Mapping and translating data across different representations is a crucial problem in information systems. Many formalisms and tools are currently used for this purpose, to the point that developers typically face a difficult question: \"what is the right tool for my translation task?\" In this paper, we introduce several techniques that contribute to answer this question. Among these, a fairly general definition of a data transformation system, a new and very efficient similarity measure to evaluate the outputs produced by such a system, and a metric to estimate user efforts. Based on these techniques, we are able to compare a wide range of systems on many translation tasks, to gain interesting insights about their effectiveness, and, ultimately, about their \"intelligence\".","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122921787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
From sBoW to dCoT marginalized encoders for text representation 从sBoW到dCoT,用于文本表示的边缘编码器
Z. Xu, Minmin Chen, Kilian Q. Weinberger, Fei Sha
{"title":"From sBoW to dCoT marginalized encoders for text representation","authors":"Z. Xu, Minmin Chen, Kilian Q. Weinberger, Fei Sha","doi":"10.1145/2396761.2398536","DOIUrl":"https://doi.org/10.1145/2396761.2398536","url":null,"abstract":"In text mining, information retrieval, and machine learning, text documents are commonly represented through variants of sparse Bag of Words (sBoW) vectors (e.g. TF-IDF [1]). Although simple and intuitive, sBoW style representations suffer from their inherent over-sparsity and fail to capture word-level synonymy and polysemy. Especially when labeled data is limited (e.g. in document classification), or the text documents are short (e.g. emails or abstracts), many features are rarely observed within the training corpus. This leads to overfitting and reduced generalization accuracy. In this paper we propose Dense Cohort of Terms (dCoT), an unsupervised algorithm to learn improved sBoW document features. dCoT explicitly models absent words by removing and reconstructing random sub-sets of words in the unlabeled corpus. With this approach, dCoT learns to reconstruct frequent words from co-occurring infrequent words and maps the high dimensional sparse sBoW vectors into a low-dimensional dense representation. We show that the feature removal can be marginalized out and that the reconstruction can be solved for in closed-form. We demonstrate empirically, on several benchmark datasets, that dCoT features significantly improve the classification accuracy across several document classification tasks.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123033017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
CrowdTiles: presenting crowd-based information for event-driven information needs CrowdTiles:呈现基于人群的信息以满足事件驱动的信息需求
S. Whiting, K. Zhou, J. Jose, Omar Alonso, Teerapong Leelanupab
{"title":"CrowdTiles: presenting crowd-based information for event-driven information needs","authors":"S. Whiting, K. Zhou, J. Jose, Omar Alonso, Teerapong Leelanupab","doi":"10.1145/2396761.2398731","DOIUrl":"https://doi.org/10.1145/2396761.2398731","url":null,"abstract":"Time plays a central role in many web search information needs relating to recent events. For recency queries where fresh information is most desirable, there is likely to be a great deal of highly-relevant information created very recently by crowds of people across the world, particularly on platforms such as Wikipedia and Twitter. With so many users, mainstream events are often very quickly reflected in these sources. The English Wikipedia encyclopedia consists of a vast collection of user-edited articles covering a range of topics. During events, users collaboratively create and edit existing articles in near real-time. Simultaneously, users on Twitter disseminate and discuss event details, with a small number of users becoming influential for the topic. In this demo, we propose a novel approach to presenting a summary of new information and users related to recent or ongoing events associated with the user's search topic, therefore aiding most recent information discovery. We outline methods to detect search topics which are driven by events, identify and extract changing Wikipedia article passages and find influential Twitter users. Using these, we provide a system which displays familiar tiles in search results to present recent changes in the event-related Wikipedia articles, as well as Twitter users who have tweeted recent relevant information about the event topics.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114060637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信