Proceedings of the 18th ACM conference on Information and knowledge management最新文献

筛选
英文 中文
Exploit the tripartite network of social tagging for web clustering 利用社会标签的三方网络进行网络聚类
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646167
Caimei Lu, Xin Chen, Eun Kyo Park
{"title":"Exploit the tripartite network of social tagging for web clustering","authors":"Caimei Lu, Xin Chen, Eun Kyo Park","doi":"10.1145/1645953.1646167","DOIUrl":"https://doi.org/10.1145/1645953.1646167","url":null,"abstract":"In this poster, we investigate how to enhance web clustering by leveraging the tripartite network of social tagging systems. We propose a clustering method, called \"Tripartite Clustering\", which cluster the three types of nodes (resources, users and tags) simultaneously based on the links in the social tagging network. The proposed method is experimented on a real-world social tagging dataset sampled from del.icio.us. We also compare the proposed clustering approach with K-means. All the clustering results are evaluated against a human-maintained web directory. The experimental results show that Tripartite Clustering significantly outperforms the content-based K-means approach and achieves performance close to that of social annotation-based K-means whereas generating much more useful information.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121889087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data SPIDER:用于大规模RDF数据的可伸缩、并行/分布式评估的系统
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646315
Hyunsik Choi, Jihoon Son, YongHyun Cho, M. Sung, Y. Chung
{"title":"SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data","authors":"Hyunsik Choi, Jihoon Son, YongHyun Cho, M. Sung, Y. Chung","doi":"10.1145/1645953.1646315","DOIUrl":"https://doi.org/10.1145/1645953.1646315","url":null,"abstract":"RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributed computing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. The loader analyzes and dissects the RDF data and places parts of data over multiple servers. The query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. This prototype will be foundation to develop real applications with large-scale RDF graph data.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"122 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120861616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
A machine learning approach for improved BM25 retrieval 改进BM25检索的机器学习方法
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646237
K. Svore, C. Burges
{"title":"A machine learning approach for improved BM25 retrieval","authors":"K. Svore, C. Burges","doi":"10.1145/1645953.1646237","DOIUrl":"https://doi.org/10.1145/1645953.1646237","url":null,"abstract":"Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 models relevance on popularity fields such as anchor text and query click information no better than a linear function of the field attributes. We also find query click information to be the single most important field for retrieval. In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25. Our model significantly improves retrieval effectiveness over BM25 and BM25F. Our data-driven approach is fast, effective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures. We demonstrate the advantages of our model on a very large real-world Web data collection.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125769969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Fast shortest path distance estimation in large networks 大型网络中的快速最短路径距离估计
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646063
Michalis Potamias, F. Bonchi, C. Castillo, A. Gionis
{"title":"Fast shortest path distance estimation in large networks","authors":"Michalis Potamias, F. Bonchi, C. Castillo, A. Gionis","doi":"10.1145/1645953.1646063","DOIUrl":"https://doi.org/10.1145/1645953.1646063","url":null,"abstract":"In this paper we study approximate landmark-based methods for point-to-point distance estimation in very large networks. These methods involve selecting a subset of nodes as landmarks and computing offline the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, it can be estimated quickly by combining the precomputed distances. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. We therefore explore theoretical insights to devise a variety of simple methods that scale well in very large networks. The efficiency of the suggested techniques is tested experimentally using five real-world graphs having millions of edges. While theoretical bounds support the claim that random landmarks work well in practice, our extensive experimentation shows that smart landmark selection can yield dramatically more accurate results: for a given target accuracy, our methods require as much as 250 times less space than selecting landmarks at random. In addition, we demonstrate that at a very small accuracy loss our techniques are several orders of magnitude faster than the state-of-the-art exact methods. Finally, we study an application of our methods to the task of social search in large graphs.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125078899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 317
Learning to rank with a novel kernel perceptron method 用一种新的核感知器方法学习排序
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646018
Xue-wen Chen, Haixun Wang, Xiaotong Lin
{"title":"Learning to rank with a novel kernel perceptron method","authors":"Xue-wen Chen, Haixun Wang, Xiaotong Lin","doi":"10.1145/1645953.1646018","DOIUrl":"https://doi.org/10.1145/1645953.1646018","url":null,"abstract":"While conventional ranking algorithms, such as the PageRank, rely on the web structure to decide the relevancy of a web page, learning to rank seeks a function capable of ordering a set of instances using a supervised learning approach. Learning to rank has gained increasing popularity in information retrieval and machine learning communities. In this paper, we propose a novel nonlinear perceptron method for rank learning. The proposed method is an online algorithm and simple to implement. It introduces a kernel function to map the original feature space into a nonlinear space and employs a perceptron method to minimize the ranking error by avoiding converging to a solution near the decision boundary and alleviating the effect of outliers in the training dataset. Furthermore, unlike existing approaches such as RankSVM and RankBoost, the proposed method is scalable to large datasets for online learning. Experimental results on benchmark corpora show that our approach is more efficient and achieves higher or comparable accuracies in instance ranking than state of the art methods such as FRank, RankSVM and RankBoost.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129867602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Acronym extraction and disambiguation in large-scale organizational web pages 大型组织网页中缩略语的提取与消歧
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646206
Shicong Feng, Yuhong Xiong, Conglei Yao, Liwei Zheng, W. Liu
{"title":"Acronym extraction and disambiguation in large-scale organizational web pages","authors":"Shicong Feng, Yuhong Xiong, Conglei Yao, Liwei Zheng, W. Liu","doi":"10.1145/1645953.1646206","DOIUrl":"https://doi.org/10.1145/1645953.1646206","url":null,"abstract":"In this paper, we focus on the automatic extraction and disambiguation of acronyms in large-scale organizational web pages, which is important but difficult due to the diversity of acronyms and the scale of organizational web pages. We propose two novel algorithms to address the key problems in acronym extraction and disambiguation: (1) An unsupervised ranking algorithm to automatically filter out the incorrect acronym-expansion pairs. Different from the existing approaches, our method does not require any hand-crafted rules; (2) A graph-based algorithm to disambiguate ambiguous acronyms, which leverages the hyperlinks of pages to facilitate the acronym disambiguation. We evaluate the proposed approaches using two large-scale, real-world datasets in two different domains. Our experimental results show that our approach is domain independent, and achieves higher precision and recall than the existing methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128420456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Session details: KM classification and clustering II 会议详细内容:KM分类和聚类
Joost Kok
{"title":"Session details: KM classification and clustering II","authors":"Joost Kok","doi":"10.1145/3261240","DOIUrl":"https://doi.org/10.1145/3261240","url":null,"abstract":"","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128452320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fragment-based clustering ensembles 基于片段的集群集成
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646232
Ou Wu, Mingliang Zhu, Weiming Hu
{"title":"Fragment-based clustering ensembles","authors":"Ou Wu, Mingliang Zhu, Weiming Hu","doi":"10.1145/1645953.1646232","DOIUrl":"https://doi.org/10.1145/1645953.1646232","url":null,"abstract":"Clustering ensembles combine different clustering solutions into a single robust and stable one. Most of existing methods become highly time-consuming when the data size turns to large. In this paper, we study the properties of the defined 'clustering fragment' and put forward a useful proposition. Solid proofs are presented with two widely used goodness measures for clustering ensembles. Finally, a new ensemble framework termed as fragment-based clustering ensembles is proposed. Theoretically, most of existing methods can be improved by adopting this framework. To evaluate the proposed framework, three new methods are introduced by bring three popular clustering ensemble methods into our framework. The experimental results on several public data sets show that the three introduced methods are greatly improved in computational complexity and also achieved better or similar accurate results than the original methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129379782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Clustering queries for better document ranking 聚类查询以获得更好的文档排名
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646174
Yi Liu, Liangjie Zhang, Ruihua Song, Jian-Yun Nie, Ji-Rong Wen
{"title":"Clustering queries for better document ranking","authors":"Yi Liu, Liangjie Zhang, Ruihua Song, Jian-Yun Nie, Ji-Rong Wen","doi":"10.1145/1645953.1646174","DOIUrl":"https://doi.org/10.1145/1645953.1646174","url":null,"abstract":"Different queries require different ranking methods. It is however challenging to determine what queries are similar, and how to rank documents for them. In this paper, we propose a new method to cluster queries according to the similarity determined based on URLs in their answers. We then train specific ranking models for each query cluster. In addition, a cluster-specific measure of authority is defined to favor documents from authoritative websites on the corresponding topics. The proposed approach is tested using data from a search engine. It turns out that our proposed topic-dependent models can significantly improve the search results of eight most popular categories of queries.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125647744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient feature weighting methods for ranking 高效的特征加权排序方法
Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646100
Hwanjo Yu, Jinoh Oh, Wook-Shin Han
{"title":"Efficient feature weighting methods for ranking","authors":"Hwanjo Yu, Jinoh Oh, Wook-Shin Han","doi":"10.1145/1645953.1646100","DOIUrl":"https://doi.org/10.1145/1645953.1646100","url":null,"abstract":"Feature weighting or selection is a crucial process to identify an important subset of features from a data set. Removing irrelevant or redundant features can improve the generalization performance of ranking functions in information retrieval. Due to fundamental differences between classification and ranking, feature weighting methods developed for classification cannot be readily applied to feature weighting for ranking. A state of the art feature selection method for ranking, called GAS, has been recently proposed, which exploits importance of each feature and similarity between every pair of features. However, GAS must compute the similarity scores of all pairs of features, thus it is not scalable for high-dimensional data and its performance degrades on nonlinear ranking functions. This paper proposes novel algorithms, RankWrapper and RankFilter, which is scalable for high-dimensional data and also performs reasonably well on nonlinear ranking functions. RankWrapper and RankFilter are designed based on the key idea of Relief algorithm. Relief is a feature selection algorithm for classification, which exploits the notions of hits (data points within the same class) and misses (data points from different classes) for classification. However, there is no such notion of hits or misses in ranking. The proposed algorithms instead utilize the ranking distances of nearest data points in order to identify the key features for ranking. Our extensive experiments show that RankWrapper and RankFilter generate higher accuracy overall than the GAS and traditional Relief algorithms adapted for ranking, and run substantially faster than the GAS on high dimensional data.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126947425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信