Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献

筛选
英文 中文
Recency ranking by diversification of result set 基于结果集多样化的近期排序
Andrey Styskin, Fedor Romanenko, F. Vorobyev, P. Serdyukov
{"title":"Recency ranking by diversification of result set","authors":"Andrey Styskin, Fedor Romanenko, F. Vorobyev, P. Serdyukov","doi":"10.1145/2063576.2063862","DOIUrl":"https://doi.org/10.1145/2063576.2063862","url":null,"abstract":"In this paper, we propose a web search retrieval approach which automatically detects recency sensitive queries and increases the freshness of the ordinary document ranking by a degree proportional to the probability of the need in recent content. We propose to solve the recency ranking problem by using result diversification principles and deal with the query's non-topical ambiguity appearing when the need in recent content can be detected only with uncertainty. Our offine and online experiments with millions of queries from real search engine users demonstrate the significant increase in satisfaction of users presented with a search result generated by our approach.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"25 1","pages":"1949-1952"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83308498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Managing interoperability and complexity inhealth systems: MIXHS'11 workshop summary 管理卫生系统的互操作性和复杂性:MIXHS'11研讨会总结
M. Bouamrane, C. Tao
{"title":"Managing interoperability and complexity inhealth systems: MIXHS'11 workshop summary","authors":"M. Bouamrane, C. Tao","doi":"10.1145/2063576.2064050","DOIUrl":"https://doi.org/10.1145/2063576.2064050","url":null,"abstract":"Managing Interoperability and Complexity in Health Systems, MIXHS'11, aims to be a forum focussing on recent research and technical results in knowledge management and information systems in bio-medical and electronic health systems. The workshop will provide an opportunity for sharing practical experiences and best practices in e-Health information infrastructure development and management. Of particular interest to the workshop themes are technical solutions to recurring practical systems deployment issues, including harnessing the complexity of bio-medical domain knowledge and the interoperability of heterogeneous health systems. The workshop will gather experts, researchers, system developers, practitioners and policymakers designing and implementing solutions for managing clinical data and integrating existing and future electronic health systems infrastructures.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"115 1","pages":"2635-2636"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89313577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Duplicate detection through structure optimization 通过结构优化进行重复检测
Luís Leitão, P. Calado
{"title":"Duplicate detection through structure optimization","authors":"Luís Leitão, P. Calado","doi":"10.1145/2063576.2063644","DOIUrl":"https://doi.org/10.1145/2063576.2063644","url":null,"abstract":"Detecting and eliminating duplicates in databases is a task of critical importance in many applications. Although solutions for traditional models, such as relational data, have been widely studied, recently there has been some focus on solutions for more complex hierarchical structures as, for instance, XML data. Such data presents many different challenges, among which is the issue of how to exploit the schema structure to determine if two objects are duplicates. In this paper, we argue that structure can indeed have a significant impact on the process of duplicate detection. We propose a novel method that automatically restructures database objects in order to take full advantage of the relations between its attributes. This new structure reflects the relative importance of the attributes in the database and avoids the need to perform a manual selection. To test our approach we applied it to an existing duplicate detection system. Experiments performed on several datasets show that, using the new learned structure, we consistently outperform both the results obtained with the original database structure and those obtained by letting a knowledgeable user manually choose the attributes to compare.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"12 1","pages":"443-452"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89826857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A pairwise ranking based approach to learning with positive and unlabeled examples 一种基于两两排序的学习方法,使用正样例和未标记样例
Sundararajan Sellamanickam, Priyanka Garg, S. Keerthi
{"title":"A pairwise ranking based approach to learning with positive and unlabeled examples","authors":"Sundararajan Sellamanickam, Priyanka Garg, S. Keerthi","doi":"10.1145/2063576.2063675","DOIUrl":"https://doi.org/10.1145/2063576.2063675","url":null,"abstract":"A large fraction of binary classification problems arising in web applications are of the type where the positive class is well defined and compact while the negative class comprises everything else in the distribution for which the classifier is developed; it is hard to represent and sample from such a broad negative class. Classifiers based only on positive and unlabeled examples reduce human annotation effort significantly by removing the burden of choosing a representative set of negative examples. Various methods have been proposed in the literature for building such classifiers. Of these, the state of the art methods are Biased SVM and Elkan & Noto's methods. While these methods often work well in practice, they are computationally expensive since hyperparameter tuning is very important, particularly when the size of labeled positive examples set is small and class imbalance is high. In this paper we propose a pairwise ranking based approach to learn from positive and unlabeled examples (LPU) and we give a theoretical justification for it. We present a pairwise RankSVM (RSVM) based method for our approach. The method is simple, efficient, and its hyperparameters are easy to tune. A detailed experimental study using several benchmark datasets shows that the proposed method gives competitive classification performance compared to the mentioned state of the art methods, while training 3-10 times faster. We also propose an efficient AUC based feature selection technique in the LPU setting and demonstrate its usefulness on the datasets. To get an idea of the goodness of the LPU methods we compare them against supervised learning (SL) methods that also make use of negative examples in training. SL methods give a slightly better performance than LPU methods when there is a rich set of negative examples; however, they are inferior when the number of negative training examples is not large enough.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"7 1","pages":"663-672"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89862696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Ranking-based processing of SQL queries 基于排名的SQL查询处理
H. Azzam, T. Roelleke, Sirvan Yahyaei
{"title":"Ranking-based processing of SQL queries","authors":"H. Azzam, T. Roelleke, Sirvan Yahyaei","doi":"10.1145/2063576.2063614","DOIUrl":"https://doi.org/10.1145/2063576.2063614","url":null,"abstract":"A growing number of applications are built on top of search engines and issue complex structured queries. This paper contributes a customisable ranking-based processing of such queries, specifically SQL. Similar to how term-based statistics are exploited by term-based retrieval models, ranking-aware processing of SQL queries exploits tuple-based statistics that are derived from sources or, more precisely, derived from the relations specified in the SQL query. To implement this ranking-based processing, we leverage PSQL, a probabilistic variant of SQL, to facilitate probability estimation and the generalisation of document retrieval models to be used for tuple retrieval. The result is a general-purpose framework that can interpret any SQL query and then assign a probabilistic retrieval model to rank the results of that query. The evaluation on the IMDB and Monster benchmarks proves that the PSQL-based approach is applicable to (semi-)structured and unstructured data and structured queries.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"35 1","pages":"231-236"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86505244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-driven detection of campaigns in social media 内容驱动的社交媒体活动检测
Kyumin Lee, James Caverlee, Zhiyuan Cheng, D. Sui
{"title":"Content-driven detection of campaigns in social media","authors":"Kyumin Lee, James Caverlee, Zhiyuan Cheng, D. Sui","doi":"10.1145/2063576.2063658","DOIUrl":"https://doi.org/10.1145/2063576.2063658","url":null,"abstract":"We study the problem of detecting coordinated free text campaigns in large-scale social media. These campaigns -- ranging from coordinated spam messages to promotional and advertising campaigns to political astro-turfing -- are growing in significance and reach with the commensurate rise of massive-scale social systems. Often linked by common \"talking points\", there has been little research in detecting these campaigns. Hence, we propose and evaluate a content-driven framework for effectively linking free text posts with common \"talking points\" and extracting campaigns from large-scale social media. One of the salient aspects of the framework is an investigation of graph mining techniques for isolating coherent campaigns from large message-based graphs. Through an experimental study over millions of Twitter messages we identify five major types of campaigns -- Spam, Promotion, Template, News, and Celebrity campaigns -- and we show how these campaigns may be extracted with high precision and recall.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"12 1","pages":"551-556"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87806973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
WikiLabel: an encyclopedic approach to labeling documents en masse 维基标签:一个百科全书式的标记文档的方法
Tadashi Nomoto
{"title":"WikiLabel: an encyclopedic approach to labeling documents en masse","authors":"Tadashi Nomoto","doi":"10.1145/2063576.2063961","DOIUrl":"https://doi.org/10.1145/2063576.2063961","url":null,"abstract":"This paper presents a particular approach to collective labeling of multiple documents, which works by associating the documents with Wikipedia pages and labeling them with headings the pages carry. The approach has an obvious advantage over past approaches in that it is able to produce fluent labels, as they are hand-written by human editors. We carried out some experiments on the TDT5 dataset, which found that the approach works rather robustly for an arbitrary set of documents in the news domain. Comparisons were made with some baselines, including the state of the art, with results strongly in favor of our approach.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"124 1","pages":"2341-2344"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87831992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Latent feature encoding using dyadic and relational data 使用二进和关系数据的潜在特征编码
S. Ando
{"title":"Latent feature encoding using dyadic and relational data","authors":"S. Ando","doi":"10.1145/2063576.2063926","DOIUrl":"https://doi.org/10.1145/2063576.2063926","url":null,"abstract":"Learning from dyadic and relational data is a fundamental problem for IR and KDD applications in web and social media domain. Basic behaviors and characteristics of users and documents are typically described by a collection of dyads, i.e., pairs of entities. Discriminative features extracted from such data are essential in exploratory and discriminatory analyses. Relational properties of the entities reflect pair-wise similarities and their collective community structure which are also valuable for discriminative learning. A challenging aspect of learning from the relational data in many domains, is that the generative process of relational links appears noisy and is not well described by a stochastic model.\u0000 In this paper, we present a principled approach for learning discriminative features from heterogeneous sources of dyadic and relational data. We propose an information-theoretic framework called Latent Feature Encoding (LFE) which projects the entities and the links to a latent feature space in the analogy of -encoding. Projection is formalized as a maximization of the mutual information preserved in the latent features, regularized by the compression rate of encoding. The regularization is emphasized over more probable links to account for the noisiness of the observation. An empirical evaluation of the proposed method using text and social media datasets is presented. Performances in supervised and unsupervised learning tasks are compared with those of conventional latent feature extraction methods.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"44 1","pages":"2201-2204"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88134749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach twitter中的主题情感分析:一种基于图的标签情感分类方法
Xiaolong Wang, Furu Wei, Xiaohua Liu, M. Zhou, Ming Zhang
{"title":"Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach","authors":"Xiaolong Wang, Furu Wei, Xiaohua Liu, M. Zhou, Ming Zhang","doi":"10.1145/2063576.2063726","DOIUrl":"https://doi.org/10.1145/2063576.2063726","url":null,"abstract":"Twitter is one of the biggest platforms where massive instant messages (i.e. tweets) are published every day. Users tend to express their real feelings freely in Twitter, which makes it an ideal source for capturing the opinions towards various interesting topics, such as brands, products or celebrities, etc. Naturally, people may anticipate an approach to receiving the common sentiment tendency towards these topics directly rather than through reading the huge amount of tweets about them. On the other side, Hashtags, starting with a symbol \"#\" ahead of keywords or phrases, are widely used in tweets as coarse-grained topics. In this paper, instead of presenting the sentiment polarity of each tweet relevant to the topic, we focus our study on hashtag-level sentiment classification. This task aims to automatically generate the overall sentiment polarity for a given hashtag in a certain time period, which markedly differs from the conventional sentence-level and document-level sentiment analysis. Our investigation illustrates that three types of information is useful to address the task, including (1) sentiment polarity of tweets containing the hashtag; (2) hashtags co-occurrence relationship and (3) the literal meaning of hashtags. Consequently, in order to incorporate the first two types of information into a classification framework where hashtags can be classified collectively, we propose a novel graph model and investigate three approximate collective classification algorithms for inference. Going one step further, we show that the performance can be remarkably improved using an enhanced boosting classification setting in which we employ the literal meaning of hashtags as semi-supervised information. Experimental results on a real-life data set consisting of 29,195 tweets and 2,181 hashtags show the effectiveness of the proposed model and algorithms.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"32 1","pages":"1031-1040"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85939031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 470
Effective stratification for low selectivity queries on deep web data sources 深层网络数据源低选择性查询的有效分层
Fan Wang, G. Agrawal
{"title":"Effective stratification for low selectivity queries on deep web data sources","authors":"Fan Wang, G. Agrawal","doi":"10.1145/2063576.2063786","DOIUrl":"https://doi.org/10.1145/2063576.2063786","url":null,"abstract":"We study the problem of estimating the result of an aggregation query with low selectivity when a data source only supports limited data accesses. Existing stratified sampling techniques cannot be applied to such a problem since either it is very hard, if not impossible, to gather certain critical statistics from such a data source, or more importantly, the selective attribute of the query may not be queriable on the data source. In such cases, we need an effective mechanism to stratify the data and form homogeneous strata with respect to the selective attribute of the query, despite not being able to query the data source with the selective attribute.\u0000 This paper presents and evaluates a stratification method for this problem utilizing a queriable auxiliary attribute. The breaking points for the stratification are computed based on a novel Bayesian Adaptive Harmony Search algorithm. This method derives from the existing Harmony search method, but includes novel objective function, and introduces a technique for dynamically adapting key parameters of this method. Our experiments show that the estimation accuracy achieved using our method is consistently higher than 95% even for 0.01% selectivity query, even when there is only a low correlation between the auxiliary attribute and the selective attribute. Furthermore, our method achieves at least a five fold reduction in estimation error over three other methods, for the same sampling cost.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"223 1","pages":"1455-1464"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86801169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信